THE ANNALS 
“of 
MATHEMATICAL 
STATISTICS 


H. C. CARVER) 
Tue OFFICIAL JOURNAL OF THE INSTITUTE OF 
MATHEMATICAL STATISTICS 


VOLUME XIX 





— THE ANNALS 


= OF MATHEMATICAL STATISTICS 


Al EDITED BY 
a S. S. WILKS, Editor 


M. S. BARTLETT HARALD CRAMER J. NEYMAN 

WILLIAM G. COCHRAN W. EDWARDS DEMING WALTER A. SHEWHART 
ALLEN T. CRAIG J. L.. DOOB JOHN W. TUKEY 

C. C. CRAIG W. FELLER A. WALD 

HAROLD HOTELLING 

WITH THE COOPERATION OF 


T. W. ANDERSON, JR. CHURCHILL EISENHART H. B. Mann 

Davip BLACKWELL M. A. GIRSHICK ALEXANDER M. Moop 

J. H. Curtiss Paut R. Hatmos FREDERICK MOSTELLER 

J. F. Daty Paut G. Hoe, H. E. Rossins 

Haro.p F. DopGse Mark Kac Henry ScHerrFré 

Paut 8. DwrErR E. L. LEHMANN JacosB WoLFOwITz 
Wi.tiiam G. Mapow 


The ANNALS OF MATHEMATICAL Statistics is published quarterly by the 
Institute of Mathematical Statistics, Mt. Royal & Guilford Aves., Baltimore 2, 
Md. Subscriptions, renewals, orders for back numbers and other business com- 
munications should be sent to the ANNALS OF MATHEMATICAL Statistics, Mt. 
Royal & Guilford Aves, Baltimore 2, Md., or to the Secretary of the Insti- 
tute of Mathematical Statistics, P. S. Dwyer, 116 Rackham Hall, University of 
Michigan, Ann Arbor, Mich. 


Changes in mailing address which are to become effective for a given 
issue should be reported to the Secretary on or before the 15th of the 
month preceding the month of that issue. The months of issue are March, 
June, September and December. 


Manuscripts for publication in the ANNALS oF MATHEMATICAL STATISTICS 
should be sent to S. S. Wilks, Fine Hall, Princeton, New Jersey. Manuscripts 
should be typewritten double-spaced with wide margins, and the original copy 
should be submitted. Footnotes should be reduced to a minimum and whenever 
possible replaced by a bibliography at the end of the paper; formulae in foot- 
notes should be avoided. Figures, charts, and diagrams should be drawn on 
plain white paper or tracing cloth in black India ink twice the size they are to 
be printed. Authors are requested to keep in mind typographical difficulties 
of complicated mathematical formulae. 


Authors will ordinarily receive only galley proofs. Fifty reprints without 
covers will be furnished free. Additional reprints and covers furnished at cost. 


The subscription price for the ANNALS is $8.00 inside the Western Hemi- 
sphere and $5.00 elsewhere. Single copies $3.00. Back numbers are available 
at $8.00 per volume or $3.00 per single issue. 


COMPOSED AND PRINTED AT THE 
WAVERLY PRESS, Inc. 
Ba.LTImore, Mp., U.S. A. 


Entered as second-class matter at the Post Office at Baltimore, Maryland, under the act of March 3, 1879 








THE ANNALS) 
of 
MATHEMATICAL 
STATISTICS 


(FOUNDED BY H. C. CARVER) 


Tue OFFICIAL JOURNAL OF THE INSTITUTE 
OF MATHEMATICAL STATISTICS 


Contents 
On the Generalized ‘‘Birth-and-Death” Process. Davin G. 


PAGE 

Probability of Coincidence for Two Periodically Recurring Events. 
Paut I. RicHarps 

Non-Parametric Estimation, III. Statistically Equivalent Blocks 
and Multivariate Tolerance Regions—The Discontinuous 
Case. JoHn W. T 

Asymptotic Properties of the Maximum Likelihood Estimate of 
an Unknown Parameter of a Discrete Stochastic Process. 
ABRAHAM WALD 

eae jog of a Root of a Determinantal Equation. 


NAND 

A Saisie Slippage Test for an Extreme Population. FReprrick 
MostTELLER 

a the Uniqueness of Similar Regions. Paut G. Horn 

otes: 
Convergence of Distributions. H»rsemrt Rossins 
On Random Variables with Comparable Peakedness. Z. W. Brrnsaum. 
A Method for Obtaining Random Numbers. H. Burks Horton 
Note on the Error in Interpolation of a Function of Two Independent 
Variables. Wiirrep M. Kincarp 
On a Lemma by Kolomogoroff. Kar-Lar Cuune 
Approximate Weights. Joun W. Tuxry 
On the Use of the Non-Central t-Distribution for Comparing Percentage 
Points of Normal Populations. Jonun E. Wats 

The Teaching of Statistics, Report of the Institute of Mathematical 
Statistics Committee on the Teaching of Statistics 

Abstracts of Papers 

Book Reviews 

News and Notices 

Report on the Berkeley Meeting of the Institute 

Report on the New York Meeting of the Institute 

Report on the Chicago Meeting of the Institute 

Annual Report of the President of the Institute 

Annual Report of the Secretary-Treasurer of the Institute 

Report of the Editor 

Constitution and By-Laws of the Institute 


a 
Vol. XIX, No. 1 — March, 1948 





insurance THE ANNALS 
Library 


AR OF MATHEMATICAL STATISTICS 


‘Abe . EDITED BY 
S. S. WILKS, Editor 
M. MB. BARTLETT HARALD CRAMER J. NEYMAN 


W. EDWARDS DEMING WALTER A. SHEWHART 


‘ J. L. DOOB JOHN W. TUKEY 
C. C. CRAIG W. FELLER WALD 


HAROLD HOTELLING 


WITH THE COOPERATION OF 


T. W. ANDERSON, JR. CHURCHILL EISENHART H. B. Mann 

Davip BLacKWELL M. A. GrrsHIcK ALEXANDER M. Moop 

J. H. Curtiss Pavut R. Hatmos FREDERICK MosTELLER 

J. F. Daty Pavut G. Hort H. E. Rossins 

Haroip F. Dopaz Marx Kac Henry ScHEerrs 

Pav 8. Dwyer E. L. LEHMANN JacosB WoLFOWITzZ 
Witui1am G. Mapow 


The ANNALS OF MATHEMATICAL STATISTICS is published quarterly by the 
Institute of Mathematical Statistics, Mt. Royal & Guilford Aves., Baltimore 2, 
Md. Subscriptidns, renewals, orders for back numbers and other business com- 
munications should be sent to the ANNALS oF MATHEMATICAL Sratistics, Mt. 
Royal & Guilford Aves, Baltimore 2, Md., or to the Secretary of the Insti- 
tute of Mathematical Statistics, P. S. Dwyer, 116 Rackham Hall, University of 
Michigan, Ann Arbor, Mich. 


» Changes in mailing address which are to become effective for a given 
issue should be reported to the Secretary on or before the 15th of the 


month preceding the month of that issue. The months ‘of issue are March, 
June, September and December. 


Manuscripts for publication in the ANNALS oF MATHEMATICAL STATISTICS 
should be sent to 8. S. Wilks, Fine Hall, Princeton, New Jersey. Manuscripts 
should be typewritten double-spaced with wide margins, and the original copy 
should be submitted. Footnotes should be reduced to a minimum and whenever 
possible replaced by a bibliography at the end of the paper; formulae in foot- 
notes should be avoided. Figures, charts, and diagrams should be drawn on 
plain white paper or tracing cloth in black India ink twice the size they are to 
be printed. Authors are requested to keep in mind typographical difficulties 
of complicated mathematical formulae. 


Authors will ordinarily receive only galley proofs. Fifty reprints without 
covers will be furnished free. Additional reprints and covers furnished at cost. 


The subscription price for the ANNALS is $8.00 inside the Western Hemi- 
sphere and $5.00 elsewhere. Single copies $3.00. Back numbers are available 
at $8.00 per volume or $3.00 per single issue. 


CoMPoOsED AND PRINTED AT THE 


WAVERLY PRESS, Inc. 
Battrmmorse, Mp., U.S. A. 


Entered as second-class matter at the Post Office at Baltimore, Maryland, under the act of March 3, 1879 











ON THE GENERALIZED “BIRTH-AND-DEATH” PROCESS 


By Davin G. KENDALL 
Magdalen College, Oxford 


1. Introduction and Summary. The importance of stochastic processes in 
relation to problems of population growth was pointed out by W. Feller [1] 
in 1939. He considered among other examples the “‘birth-and-death”’ process 
in which the expected birth and death rates (per head of population per unit of 
time) were constants, Xo and wo, say. In this paper I shall give the complete 
solution of the equations governing the generalised birth-and-death process 
in which the birth and death rates X(¢) and u(t) may be any specified functions 
of the time ¢. The mathematical method employed starts from M. 8. Bartlett’s 
idea of replacing the differential-difference equations for the distribution of the 
population size by a partial differential equation for its generating function. For 
an account of this technique,’ reference may be made to Bartlett’s North Caro- 
lina lectures [2]. 

The formulae obtained lead to an expression for the probability of the ultimate 
extinction of the population, and to the necessary and sufficient condition for a 
birth-and-death process to be of “transient” type. For transient processes 
the distribution of the cumulative population is also considered, but here in 
general it is not found possible to do more than evaluate its mean and variance 
as functions of ¢t, although a complete solution (including the determination of 
the asymptotic form of the distribution as ¢ tends to infinity) is obtained for the 
simple process in which the birth and death rates are independent of the time. 

It is shown that a birth-and-death process can be constructed to give an 
expected population size 7%, which is any desired function of the time ¢, and among 
the many possible solutions the unique one is determined which makes the 
fluctuation, Var(n,), a minimum for all ¢. 

The general theory is illustrated with reference to two examples. The first 
of these is the (Ao , wit) process introduced by N. Arley [3] in his study of the 
cascade showers associated with cosmic radiation; here the birth rate is constant 
and the death rate is a constant multiple of the “‘age’’, t, of the process. The 
fi -curve is then Gaussian in form, and the process is always of transient type. 

The second example is provided by the family of “‘periodic” processes, in 
which the birth and death rates are periodic functions of the time ¢. These 
appear well adapted to describe the response of population growth (or epidemic 
spread) to the influence of the seasons. 


2. The formulation and solution of the equations for the general (A, «) process. 
Let the integer-valued time-dependent random variable n; measure at time ¢ the 
1 It appears from some remarks by Arley and Borchsenius [5] that the generating func- 
tion method was first employed in problems of this kind by Dr. C. Palm. 
1 











2 DAVID G. KENDALL 
size of a population, and suppose that in an element of time dé the only possible 
transitions (and their associated probabilities) are: 
Neat = Me +1, d(d)ndt + o(dt); 
(1) Nezat = Ne, 1— {A@ + u@jndt + o(dt); 
Neat = nm, — 1, p(t)n dé + o(dt). 


As an initial condition it will be supposed that the population is descended from 
a single “ancestor’’, so that m) = 1, and thus 


(2) P,(0) = 1, P,(0) = 0 (n ¥ 1). 
It then follows that the P,(é) must satisfy the differential-difference equations 
(3) 5, Palt) = (n + 1)uPau(t) + (n — 1)rAPr-alt) — n(A + u)P,(), n > 1, 


and 
0 
(4) > Pot) = uP, (t) 


(where for convenience of writing I have ceased to indicate explicitly the de- 
pendence of \ and uw on the time). If P,(é) is defined to be zero when n < 0, 
the first of the above equations will then be true for all , and accordingly the 
generating function 


(5) g(z,t) = 2) Paltz” 

must satisfy the linear partial differential equation 
dp _ 7 _ dg, 

(6) oe = — Oe - w) $5 


the problem is to find the solution to this equation when it is coupled with the 
boundary condition ¢(z, 0) = z. 

The equation (6) is of Lagrange’s type, and can be solved in the usual manner. 
The auxiliary equation is 


dz 
(7) qo et A+ we — 2, 

and while in particular examples it might be convenient to attack this equation 
directly, progress in general is more easily made by observing that (7) is of 
Riccati’s form, for which a general theory is available.” The fundamental 


property of a Riccati equation is that the general solution is a homographic 


2 See, for example, G. N. Watson [4], pp. 93-94. 


TIT: ~< 


Pe 


RE RRRN 


PR 





ON THE GENERALIZED ‘“‘BIRTH-AND-DEATH” PROCESS 3 


function of the constant of integration, so that 





ie fi + Che 
Ss + Cf,’ 
and equally 
Ce “fs — fi 
fe — of’ 


where fi , fo, fs and f, are all functions of the time ¢. Thus the general solution 


of (6) is of the form 
zfs — i 
,t) =® ’ 
ote, ) - He 
and from the boundary condition ¢(z, 0) = z it then follows that 


gilt) + 2g2(t) 
gs(t) + 2ga(t) © 





g(z, t) = 


On expansion, one obtains 
(8) Po(t) = & and P,(t) = {1 — Po(t)}(1 — mat (n 2 1), 


where £, and 7; are functions of the time ¢. Thus, for the general (A, u) process, 


the population size at any time is distributed in a geometric series with a modified 
zero term 


The next stage of the solution is to determine the functions £, and y;. From 


(3) 


(9) oe, ~ Et G-—t-w 
1 — 72 
and if this expression for ¢ be substituted in (6) it will be found that 
(nt! — En’) + 9’ = A(1 — £)(1 — 9), 


and 


’ = w(l — &)(1 — »). 

Now let U = 1 — ~and V = 1 — 7», so that 
U'/U = — pV, 

and 

Vv’ = (u—dA)V — pV’. 
The last equation is of Bernoulli’s type and can be solved by writing 

W =1/V, 
"8 Here é’ = dé/dt, eto. 











4 DAVID G. KENDALL 


so that 
W’ + (u—AW = uz. 


Initially = » = 0, and U = V = W = 1; the solution of the W-equation is 
therefore 


t 
(10a) Wz=-eé° (4 + [ o plaar', 
0 
where the function p is defined by 
t 
(11) el) = | {u(z) — A(x) jar. 
0 


Integration by parts gives two other formulae for W which will prove useful; 
they are 


t 
(10b) Wnts | &na)dr, 
0 
and 
t 
(10¢) W=KI te) +4° I ON) + p(t) dr. 


The quantities U and V, and hence also é and 7 can now be expressed in terms of 
p and W, for 


U' =_l V = - — Ww _ , 
U . W y™ * > 
and so 
e . ae ll 
(12) &=1- Ww and m= 1 7° 


These results, together with (8), suffice to determine completely the P,(¢) as 
functions of the time ¢. 

It is easy to deduce formulae for the mean and variance of n, (these could also 
be obtained directly from (6)). For the mean, 


(13) fy = er, 





while for the variance, 


(1 — &)(€ + ») 
i — of 


Var (n,) = ¢*(2W —1-—e”) 


(14c) 


t 
oe ef? 1X(7) + u(r) fdr. 
0 


A remem TS 


SEER TECEeEE 


roe 


60 Te 


RT RENE Ty RENN RSS OORT 





ON THE GENERALIZED “‘BIRTH-AND-DEATH”’ PROCESS 


Alternatively, using the other forms for W, one can write 


t 
(14a) Var (n,) =e” er —1+ 2° Puaar, 
0 


t 
(14b) me’ {1 —e°’+2° I erdar}, 


If the initial population m5 = N > 1, these formulae for %; and Var(n,) are to 
be multiplied by N. 


It is now a simple matter to apply these formulae to the Arley (Ao , wit) proc- 
ess. It will be found that 
p = gurl — Dol. 
and 


t 
W = 1+ rye mire | eit dor 7, 
0 


The mean growth of the process therefore follows the Gaussian law 


i Apt—4}u,t2 
fi, = e° du ; 


while for the variance (using (14b), since \ is a constant) one finds 
. 2 
Var (n,) =#%(1 — fi) + 29? | either gd, 
0 


in agreement with Arley [3] and Bartlett [2]. The distribution of n; at time . 
follows on inserting the above values of p and W into (8) and (12). 


3. The chances of extinction. The simplest special case is that in which 
(A, «) have the constant values (Ao , uo); this is the process introduced by Feller 
[1] and later discussed by several writers. The formulae (13) and (14c) give 
at once the results 


(15) nh, = &** ond Var (m) = eA f(t — 1), 
; No — Mo 
due to Feller, while since 
Note — Mo 
WwW = ——_, 
No — Mo 


equations (8) and (12) give 


(16) aU) = wolfe — 1) and P,() = {1 = Pot) } (1 — mm ? (n = 1), 
Note — Mo 
4 See Arley [3], Arley and Borchsenius [5], Bartlett [2] and Kendall [6]. Palm’s formulae 
(16) are stated without proof by Arley and Borchsenius, but it appears from their remarks 
that he used a generating function method probably identical with that later employed by 
Bartlett and myself. 











6 DAVID G. KENDALL 


where 


r Ao(t: — 
n = © Py) = HD 

Ho oT: — Ho 
These formulae were first given by C. Palm. They actually hold only if 
do * wo ; in the case of equality, W = 1 + Abd, and then 

i, = 1, Var(nz) = 2ol, 
Aot 

1 + Aol 


where n; = P,(é). 
One particularly interesting point is that 


(17) P(t) = 





and P,(t) = {1— Po} — ne (n= 1), 


P(t) ~last— @ ifr» Sw, 


so that the population is ‘‘almost certain” to die out, even though in the critical 
case (Ay = mo) the expected population size 7; has a constant value. The same is 
true for any initial size of population; the new expression for Po(t) is then simply 
equal to the former one raised to the power m = N, and therefore tends to unity 
as before. This phenomenon of extinction was first noticed in a similar problem* 
by Francis Galton and H. W. Watson; an account of their work is given in Ap- 
pendix F of Galton’s book [7]. 

The formulae of the last section now make possible a discussion of the chances 
of extinction for the general (A, u) process. When mm = 1, 


t 
[ e* udr 
mibebnnen 


(18) P(t) _ i ? 
1 + | e’udr 
0 


and so the necessary and sufficient condition for the ultimate extinction of the popu- 
lation is that the integral 


(19) Sw I oP ula)dr 


should be divergent. 

It will be noticed that the integrand of (19) is non-negative, and so the in- 
tegral must either diverge to plus infinity, or have a finite value. Hence in any 
case the population always has a definite chance of extinction, given by I/(1 + 1). 
For a population descended from N initial ancestors, the P,(t) are generated 
by the function 


(20) ! 28= = a} 


’ The extinction of family-names. Further references will be found in my paper [6]. 


| 
’ 
| 
| 








+ ORE Se 


TT eeepc Te. 


Be ree 


RR aN i RE eB 


ON THE GENERALIZED “‘BIRTH-AND-DEATH”’ PROCESS 7 


so that 


P(t) = Et, 


and the chance of ultimate extinction is 


I N 
- (ch) | 


which is or is not equal to unity for all N indifferently. 

Extinction is impossible, in the sense of being an event of zero probability, if 
and only if u is identically zero, so that the process is one of reproduction only. 
It is also worth noting that a necessary but not sufficient condition for almost 
certain extinction is the divergence of the integral 


(22) [ p(r)dr. 


For if (22) had a finite value, p(t) would be bounded for all ¢, and so (19) could 
not be divergent. In general, when J = and the population is almost cer- 
tainly doomed to extinction, I shall speak of the process as transient. 

For a transient process it is of interest to consider the random variable T, 
defined to be the ‘‘age”’ of the process at the moment of extinction. Since 


P(t) = Probability {T < ¢}, 
the probability distribution of T is Po(T)dT, or 
e°?) u(T)dT 


i ot. Pe u(e)ar 


For example, in the simplest birth-and-death process, when \ and yu are equal 
constants, the distribution of 7 is 
do dT 

(L + AT)?’ 


This is for an initial population m = 1; more generally, when mp = N > 1, the 
distribution of T is 


(23) 0<T < o, 


(24) 0<T < oa. 


NPT) {Po(T)}"—"aT. 


The median life-time T,, is determined by the relation 
Ta 
(25) [°° uae = 1. 
0 
For the simple process, 7’, = 1/9 when Ay = wo , and more generally 


(26) Tn = 





1 A 
. log (2 - ) (Ao * uo) 
Ho — Xo Ho 











8 DAVID G. KENDALL 


if m = 1. When m = N > 1, the formula for T,, becomes 


_ Tm y N 
2 [ p(r) = WN = : 
(27) ; e p(r)dr 1/(2 1) fn? 9 


For the balanced process (Xo , Ao) it therefore follows that 
(28) Tn(N) = Tr(1)/(2"" — 1) ~ 144N T,(1), 


as N tends to infinity. If the process is unbalanced, however, so that »x < wo, 
this asymptotic proportionality to N does not hold, and instead 


1 I" —_ Ao } log N 
(29) T, = —!_ io (ant \ log NN 
Mo — Xo . (20 ia 1) uo Ho — Xo 


as N tends to infinity. 


4. The cumulative population. There is associated with a birth-and-death 
process another random variable, M,, which is of importance in some applica- 
tions. This is defined as follows: initially Wy = no , while for t > 0, M7, shares 
all the positive jumps of n;. 

For example, if n; represents the number of cases of a disease in a population 
at time t, WM, will be the total number of cases which have been recorded up to 
that time. If the process is transient, so that the epidemic is almost certainly 
extinguished in the course of time, 1/, will then be a measure of its overall 
severity. 

Again, if n, represents the viable count of a population of bacteria’ with a birth 
rate A(t) and a death rate u(t), MW, will be equal to the fotal count in which living 
and dead organisms are not distinguished. 

In order to discuss the joint variation of n; and M, it is necessary to introduce 
the new generating function 


2 i 


(30) viz, w,t) = D> Dd Prawm(t)2"w™ . 
n=0 M=(0 : 
Here the P,,x(t) give the joint frequency-distribution of n, and M, at time ?. 
By the usual argument the differential equation satisfied by the function y 
will be found to be 
Oy _ 


(31) -* {Awe? — (A + wz + } 


oy 

02” 

and the associated boundary condition (if initially m» = My = 1) is 

(32) y(z, w,0) = zw. 

I have been unable to solve this equation for general \(¢) and u(t); the solution 
when \ and yu are constants will be given in the next section. It is however 


* For some general remarks about birth-and-death processes in relation to bacterial 
growth, reference may be made to my paper [6]. 


ee 


6 pee re 


EN 


Pr CR Ae SERINE OT ORR 


IAT ET + nae <oTe 


ed 


= 


mami 





ON THE GENERALIZED “‘BIRTH-AND-DEATH’”’ PROCESS 9 


possible to find general expressions for the mean and variance of M, ; for this 
oo 7 ° ° ° 
purpose it is more convenient’ to work with the cumulant-generating function 


(33) K(u, v, t) = log y(e", e’, é). 
This satisfies the differential equation 
0K ut+o —u oK 
) a — a = a eae 
(34 Ar tA(e 1)— al —e "i =, 


and of course 
K = ute + vM, + 3u’ Var (ny) 
(35) 
+ 40° Var (M,) + w Cov (n., Mi) + ---. 


Expanding both sides of the equation in powers of u and v, and equating coeffi- 
cients, one obtains the differential equations 


(36) - Ne = (X— whe, 

(37) 5 Var (ng) = (A + pte + 2(A — wv) Var (ne), 
(38) ; M, = \i, 

(39) ; Var (M,) = Avy + 2A Cov (nm, M2), 

and 


(40) ; Cov (ne, Mt) = dvig + A Var (me) + (A — pw) Cov (ne, M;). 


The solutions to the first two equations have of course already been given in 
section 2; from the third it follows that the mean value of M; is 


t 

(41) M,=1+ [ cP r(r)dr. 
0 
The solution of the fifth equation is 
rf Var (n-)) 
(42) Cov (ne, Mi) = ae | 41+ =) nerd, 
- t t 
and so the variance of 7, is 
t 

(43) Var (M,) = I lig + 2 Cov (ne, Me) }X(7) dr. 

0 


7 Compare Bartlett (2). 











10 DAVID G. KENDALL 


In illustration of these formulae, consider first the Arley (Ao , u:¢) process; from 
(41) 


t 
(44) M,=1+% I groin’ ge 
0 
but the complete expression for Var (1/,) will be a multiple integral which does 


not appear to admit of much simplification. 
For the simple (Ao , uo) process, however, when Ay < yo, it readily follows that 


(45) M, a eo No Nt 
Ho — ro ’ 
- ( 
(46) Cov (n,, M,) « —a& | 2aol — St >a . adh, 
Ho — Xo Ho — Ao 
and 


; o(uo + Ao) A ANG po tiie , dO(uo + Ao) a 
47) Var (M,) = RoHo Ad (7 — gy — Fro Hole , Aoluo 1 Ao) (y _ 52) 
( ) ” ( ‘ (uo —_ Xo)? ( me) (uo = Xo)? 7 (uo a Xo)? ( ™ 


Thus in the limit, as i —~ «, the mean and variance of M,, are 


i. = —“_. 
Mo — A 

(48) 0 | . 

, No Ho(Ao + po 

and Var (M.) = —————_ , 
ow (uo — Ao)? 

the covariance of course tending to zero. If the process is balanced, so that 
Ao = wo and nm; = 1, the integral for M, has the value 1 + Adt, which increases 
without limit as ¢ tends to infinity. This will always be so for a balanced process 
if the integral 


[ A(r)dz 


is divergent. 

If the initial population 7 is equal to N > 1, and if all its members are counted 
into My, the only modification necessary to the above formulae is that in each 
case the right-hand side is to be multiplied by N. 


5. The asymptotic distribution of the cumulative population for a simple 
transient birth-and-death process. The equation (31), which appears in the 
general case to be intractable even if one only requires the asymptotic distribu- 
tion determined by ¥(1, w, ©), can be solved completely in the specially simple 
case when the birth and death rates A(¢) and u(t) have the constant values Xo 
and yw. 

Let a and £6 be the roots of the quadratic 


(49) howe” — (Ao + mo)z + wo = 0, 


set 


ca 





ON THE GENERALIZED “‘BIRTH-AND-DEATH”’ PROCESS 1l 


so chosen that 0 < a < 1 < 8; then the general solution of (31) will be found by 
the usual method to be 
V oe : 
woe 


The boundary condition ¥(z, w, 0) = zw therefore gives 


- yaw (Bao +06 - a). 
(8 site 2) + (z ae a)e—row(b—a)t 
and it may be noted that if nm») = My = N > 1, this formula for » would have to 
be raised to the Nth power. It will suffice, however, to discuss the simplest 
case when m = My = 1. 
Let the process be transient, so that Ao < uo ; then the asymptotic frequency 
distribution of 17, when t — © is determined by the generating function 


(51) YUL, w, ©) = wo = en at eee 


and here it is the positive square root which must be taken. The probability 
distribution of M,, is thus 


No + wo (2M)! a 





(52) Qu = Nr 2x M12 oM— 1? (M = 1, 2,3, ---), 
where 


(Xo + po)? 


The first few terms are 





and it is easy to verify that the mean and variance of this distribution agree 
with the values given in the last section. When >» = uo, x = 1, and then the 
terms in (54) fall off to zero like M*”, M., being infinite (in accordance with the 
remarks at the end of section 4). 


6. The determination of the process when its mean growth, 7, is given. 


Since i, = e°"”, it follows that 


(55) Mt) — w(t) = Flog fs, 


and thus if 7%; is required to be a given function of the time, the birth and death 
rates must be chosen in accordance with (55); the only other condition is that 
for all ¢, A(t) > O and u(t) > O. 

Arley has pointed out that the simple process (A(t) = c, u(t) = 0) gives a 
smaller fluctuation, Var (n,), than any other simple process with the same mean 








12 DAVID G. KENDALL 


growth, say (Ao , Ho) Where \y — wo = c. This suggests that one should consider 
the more general question: if 7; is given for all t, for which choice of the functions 
A(t) and w(t) will the fluctuation Var (n;) be a minimum? 

Suppose then that the whole region ¢ > 0 consists of three sets of intervals, 
E, , E. and E;, and that within an interval of the set E; , 


ni, is a decreasing function if j = 1, 
vm, is an inereasing function if 7 = 2, 
and fi, is a constant if 7 = 3. 


Then one can write 


Var (n,) = e ?Syfe"] + 207% eV r)dr 


Ey 
+ Male] + 20% [usar 
Ee 
+ | e {N(r) + y(r)}dr. 
E3 
Here the terms involving \ and yu explicitly are all non-negative, and so Var 


(n,) will be a minimum for the (unique) choice of \ and » which makes them all 
vanish, namely: 


in E,,r\(t) = 0 and u(t) = — n/n: ; 
(56) in Ee, A(t) = ne /ng and u(t) = 0; 
in £3, (é) = u() = 0. 
However, when one is looking for a (A, uw) process with a given 7, function, 
this minimum-fluctuation solution would frequently be an artificial one. For 


example, suppose it is required that 7, shall be a Gaussian curve, reducing to 
unity when ¢ = 0; then 


(57) fi, = eho? 


say, and A(t) — u(t) = A» — uit; the most natural solution is then the Arley proc- 
ess, 
A(t) = ro, u(t) = mit. 


It is of interest that a (A, u) process can be found for which the expected growth 
follows a logistic law, 


a a ‘ 
(58) n= 1+ (a — le (a > 1, B> 0). 
According to (55) one must have 


\ ay = _@— 16 
A(t) me? Gah 





RS SA eT EL RT SNR NICER = 


TE A ER 





Yr emetny et oe 


SS SRT ER ORNS 


ON THE GENERALIZED “BIRTH-AND-DEATH’”’ PROCESS 13 


The minimum-fluctuation solution is thus the purely reproductive process 


(a — 1) 


(59) A(t) = e + (a — 1) ’ 


p(t) = 0, 


which satisfies the relation 
(60) A(t) = B (1 ~ *), 
Q 


as might have been expected, since the Verhulst-Pearl-Reed differential equa- 
tion (which forms the deterministic basis for the logistic law) is 


ldn n 
(61) 1d = 6(1-2). 


7. “‘Periodic” birth-and-death processes. Asa further example of the general 
theory it is worth considering the “‘periodic” processes for which the expected 
growth 7, is a function of the time which repeats itself with the period 6. It 
will then follow that p(t) and so also X(t) — u(t) have the period &, while p(¢) 
must be zero whenever ¢ is an integer multiple of &. The only cases of interest 


are those in which \ and yu are separately periodic, and then it can be seen from 
(14c) that 


= 


(62) i = m and Var (n) = kno | & Iz) + u(r) }dr, 
0 


whenever ¢ = ka, for every positive integer k. Thus, although the expected 
value of n, repeats itself regularly, in practice this ‘‘periodicity’’ would be ob- 
scured by the rapid increase, with increasing ¢, in the magnitude of the random 
fluctuations (as measured by Var (n,)). Moreover, since 


ke @ 
[ oP u(s)dr = ‘| ef u(z)dz, 
0 0 


it is clear that the process is necessarily transient, there being unit probability 
that »; will ultimately be reduced to zero. 

Periodic birth-and-death processes are likely to be of importance in biology; 
it should be pointed out, however, that this type of process describes the stochas- 
tic modification of a regular periodicity imposed on the model from outside, and 
it is not to be confused with other stochastic models which themselves generate 
irregular (non-phase-keeping) oscillations. The models discussed in this section 
are in fact suitable for the quantitative description of seasonal influences. 

Before going into further detail it is natural to specialise the model by assum- 
ing that the functions \ and yp are at most simply harmonic. If m = 1, and since 
there is to be no damping, one will then have 


63) fig =e etlrinnlt+o—sinre (a > 0), 











14 DAVID G. KENDALL 
where v@ = 27, and a and « are amplitude and phase constants, respectively. 
The functions \ and yu are now to be determined from the relation 

A — w = av cos w(t + e), 


and this can be done in many ways. The minimum-fluctuation solution would 
here be artificial, and it is more natural to select two other solutions, 


(64) A = av{l + cos v(t + ©}, B= ap, 
and 
(65) A = ap, p= av{l — cos rp(é + «€)}, 


for further consideration. In the first of these the death rate is constant and 
the birth rate executes simple-harmonic oscillations, while in the second it is the 
birth rate which is constant, and the death rate which oscillates. It can be seen 
that, of all solutions of these two types, (64) and (65) are those with the least 
value for Var (n,). From formulae (14a) and (14b) it will be found that, for 
either process, 


(66) Var (n) = 4rkalo(a)e“”* when t = ka 


where J(a) is the Bessel function of zero order, of the first kind and of imaginary 
argument. (It will be noticed that, whenever ¢ is an integer multiple of &, the 


distribution of the population size n, is the same for the two models.) For small 
oscillations, when t = ka, 


(67) Var (n) ~ 4rka as a > 0 
since J)(0) = 1, while for large oscillations 
(68) Var (n) ~ 2k(Qaa)*/timin asa— a, 


(Here fimin is the minimum value of 7%; .) 


The calculation of Po(@) presents some points of interest. For either model 
it proves to be 


Qralo(a)e™*”* 
(69) 1 + Qral(ajeinre’ 


this is the probability that a population element, known to be descended from a 
single individual at time ¢ = 0, will have become extinct one year later (if one 
identifies the oscillations with a seasonal effect). It will be seen that Po(@) 
will be least when sin ve = —1, and greatest when sin ve = +1; i.e. when 
vi, is expected to have a minimum, or a maximum, at ¢ = 0, respectively. Ac- 
cordingly it follows that the progeny of a new member of the population is most 
likely to survive till the following year if the ‘‘ancestor’’ commences its ‘“‘“mem- 


bership” at a time of year when the population would normally have its mini- 
mum value. 





| 
| 


Sener whe 


_— 


ST LI REL SE 8 8 





Ee, RS LT Te Ne eT 


wre 


oO RRL RT Sma 8 


ON THE GENERALIZED “‘BIRTH-AND-DEATH”’ PROCESS 15 


In conclusion, I wish to thank Professor M. 8. Bartlett for many helpful dis- 
cussions on the subject of this paper. 


REFERENCES 


[1] W. Feuuer, ‘Die Grundlagen der Volterraschen Theorie des Kampfes ums Dasein in 
wabrscheinlichkeitstheoretischer Behandlung’’, Acta Biotheoretica, Vol. 5 (1939), 
pp. 11-40. 

[2] M. S. Barriert, Stochastic Processes (notes of a course given at the University of North 
Carolina in the Fall Quarter, 1946). It is understood that copies of these notes 
are available on request. 

[3] N. Aruey, On the Theory of Stochastic Processes and their Application to the Theory of 

Cosmic Radiation, G. E. C. Gads Forlag, Copenhagen, 1943, pp. 106-114. 

[4] G. N. Watson, The Theory of Bessel Functions, University Press, Cambridge, England, 
1944, 

[5] N. ARLEY AND V. BorcuseEntvs, ‘‘On the theory of infinite systems of differential equa- 
tions and their application to the theory of stochastic processes and the perturba- 
tion theory of quantum mechanics’’, Acta Mathematica, Vol. 76 (1945), pp. 261- 
322 (esp. 298-9). 

[6] D. G. Kenpa.t, ‘“‘On some modes of population growth leading to R. A. Fisher’s loga- 
rithmic series distribution’. To appear in Biometrika. 

{7] Francis Gatton, Natural Inheritance, Macmillan, London, 1889. 











PROBABILITY OF COINCIDENCE FOR TWO PERIODICALLY 
RECURRING EVENTS! 


By Paut I. Ricuarps 


Brookhaven National Laboratory 


Summary. This paper contains a study of the following problem: Each of 
two events recurs with definitely known period and duration, while the starting 
time of each event is unknown. It is desired that, before the elapse of a certain 
time, the events occur simultaneously and that this “‘overlap” be of at least a 
given minimum duration. 

The probability of this satisfactory coincidence is first evaluated, and it is 
found that the solution, while mathematically adequate, is of no value for prac- 
tical application. This circumstance arises from. the possibility that, with 
certain rational ratios of the periods, the events may “lock in step’. Accord- 
ingly, an attempt is made to smooth the probability function with respect to 
small variations in the ratio of the periods. Due to difficulties in manipulating 
the number-theoretic expressions involved, this smoothing is carried through 
only by the use of certain approximations. Moreover, because of these same 
difficulties, an averaged value of the probability itself is not obtained, but, in 
its stead, there is derived a formula for that fraction of randomly related repeated 
trials in which the original probability will be less than one-half. 

Thus, the original problem is not completely solved. The results obtained, 
however, do allow one to compare the relative advantages of different situations 
and to make a rough estimate of the likelihood of success. Generally speaking, 
the analysis is applicable whenever the ratio of ‘‘on time” to “‘off time” is small 
for each event. 


1. Introduction. Our problem may be represented schematically as follows: 
Consider two pulse waves (Fig. 1) of periods T; , Tz, pulse widths t,, 2, and 
phases ¢; ,¢2. It is desired that these pulses overlap at least once within a given 
time interval; moreover, an overlap is not satisfactory unless its duration is at 
least as great as some assigned t,,. The starting phases ¢; and ¢2 are unknown 
for both waves. Our problem, then, would appear to be to calculate as a function 
of time the probability of at least one overlap of duration at least tn. 

This probability will be calculated later, and, while mathematically adequate, 
is totally useless for practical application. This rather unusual occurrence in 
applied mathematics arises from sources generally kept in mind only by experi- 
mental physicists. Namely, the very nature of the science of measurement, 
involving as it always does at some stage, the use of the human senses, precludes 


1 This work was done in part under Contract No. OEMsr-411 between Harvard Univer- 
sity and the Office of Scientific Research and Development, which assumes no responsibility 
for the accuracy of the statements contained herein. 


16 


f 


ee 


ens seme ew 


pea ee eer 


On ame one 


Te 


eerste 


a ere s 


oe 


Pe eT SURO RRC TT 





PROBABILITY OF COINCIDENCE 17 


the availability of mathematically exact values of the parameters of the problem. 
In other words, although experimental error can sometimes be made amazingly 
small, it can never be eliminated. 

Now, as might be expected from the possibility that the waves may “lock in 
step’’, our probability is extremely erratic with respect to very minute changes 
in the periods 7; , J2. For example, let 7; = Tz = 100% = 100% (t, = 0); a 
simple direct calculation then shows that, for all times greater than 7, = T-2 , the 
desired probability is 0.03. Now if we let 7; = T2 + «, one wave will “creep up” 
on the other, and eventually (for times greater than 7,72/e) the probability is 
unity! Thus it may very well happen in a practical application that the param- 


eters are known to an accuracy essentially sufficient only to give the obvious 
result: 0 << P <1. 


| —. 
| 
-s— 
<-> 
| 
| 
| —> t= t, 
| 
| 
4; 
| —T,— 
t=o 


Fig. 1 


In the practical problem originally considered, uncertainty in the data arose 
not only from experimental error but also from slight instability of equipment. 
Thus some means of averaging over variations in the periods had to be found 
if the analysis was to be of any practical value whatsoever. 

For reasons which will appear in the later analysis, this smoothing entails 
difficulties which the author was unable to overcome with any great success; the 
nature of the results which have been obtained is discussed in the next section. 
These results involve several approximations which, generally speaking, are 
based on the assumption that the ratios t;/7; are both small. 

It might be noted finally that the obviously favorable situations f; > T2 or 
t: > T; often cannot be used because of numerous practical difficulties. 


2. Results. In this section, we shall summarize the results of the later 
analysis for the benefit of those readers not interested in the latter. At the end 
of this section, there is an outline of the practical application of the formulas. 











18 PAUL I. RICHARDS 


We shall continue to use the notation already introduced: 


t, , 2 = durations of the events; 


a) T, , Tz = periods of the events; 
tm = minimum satisfactory duration of coincidence; and 
P = probability of at least one satisfactory coincidence. 


We shall also use the (at present) rather arbitrary notation: 
t = (time — t,,) 

(2) Po = (th — tm)(ta — tm)/TiT 2 
w = (4 + te — 2tn)/TiT>. 


The probability function for short time intervals is: 


(3) P=Py)+ ut, for t < Max(T7), 72). 
In any case: 
(4) P<Po+ ut. 


As already explained, the functional dependence of P for large ¢ is of no prac- 
tical use due to its extremely erratic variation with small changes in the periods 
Ti, Te. 

For reasons which will later become apparent, the only type of averaging which 
has yet been carried to completion is the following. Consider that many trials 
of equal length are made and that in each individual trial, all the parameters 
are, by some mysterious device, held constant with absolute, mathematical 
exactitude. Assume for definiteness that T. < 7;. Between different trials, 
let t and T» vary in such a way that 7,/T:2 takes all values within a range of 
4 with equal probability. (In the original problem, the ratios t;/7; necessarily 
remained constant.) The quantity f given below then represents that fraction 
of the trials in which the rigorous probability is less than an assigned value = 
Po + Q. Thus the smaller f is, the greater are the chances of success. 

It must be admitted that this method assumes several things which are not 
true in practice. First, the parameters of the problem probably vary by at 
least a percent even within a single trial. More serious, the required variation 
in T,/T2 may, in the extreme case JT; = JT2, demand as much as 33% variation 
in Tz. While considerable variation does occur, it is doubtful that it attains 
this magnitude. Finally, the method assumes that 7) stays fixed as 72 varies, 
whereas actually 7, and 72 vary simultaneously. 

Despite these drawbacks, it was felt that the results were meaningful for the 
practical problem. In any case, they must serve until a more adequate analysis 
can be carried through. 

The reader will notice that the final results have the form of a “probability of a 


NONE RAR TORI — 





| AT RTE SS TN MATISSE TREN 


nr 








PROBABILITY OF COINCIDENCE 19 


probability”. It would thus seem that a simple integration would yield a true 
probability, but, unfortunately, the formulas for f are reasonably accurate only 
for Q < 3. The final formula for f = fraction of trials in which P < Py + Q is: 


1 for iw < Q, 


1216 @{1 + (= m 1) log (1 - @)\, foriw>Q, Q<1/2. 

This expression is subject to error from several sources. First it is an approxi- 
mation to a number-theoretic formula given in (31); this approximation is best 
for t and Q/w large compared to Max(71, T2). A completely general comparison 
of (31) and (5) = (88) is given in Fig. 2, where the agreement will be seen to be 
quite adequate even for relatively small ¢ and Q/w. (The dotted contours are 
straight lines passing through the origin.) When ¢ and Q/w are small this first 
source of error can be eliminated by using the solid contours of Fig. 2 in place 
of (5). 

Secondly, formula (31) itself is an approximation and involves the use of 
simplified probability formulas and an assumption that Po and w are constant 
as T, varies. The maximum possible magnitude of these errors in (31) is given 
by (parentheses indicate functional dependence) : 


where, as 72 varies, 


(5) f= 


w, minimum, maximum values of w 


change in Po 


> a 
i 


q = maximum value of w°T;T2. 


Generally speaking, these errors are small if ¢;/7'; are small and if ¢ is large com- 
pared to Max(7, T2). Also, there is considerable possibility that certain errors 
will cancel in such a way as to make (6) correct with g = 0. 

We shall now outline the practical use of these results. Given nominal values 
of the parameters defined in (1), choose a convenient value for Q < } (usually 
Q = 34), and substitute into (2) to find tw/Q. From (5), one may then determine 
f = fraction of trials in which P < Po + Q. (Low values of f are thus desirable.) 
For computational convenience, (5) has been plotted in Fig. 3, while, above the 
range of Fig. 3, the following lies within 1% of (5). 


(7) f = 0.608(Q?/tw) for tw > 10Q. 


Note also that (4) may often be of considerable use in quickly eliminating cases 
of very poor probability, and recall also that (3) will give the true, directly mean- 
ingful probability whenever ¢ is no greater than Max(71 , T2). 

Evaluation of the maximum possible error in f as so obtained is more com- 
plicated. If ¢ and Q/w are small, Fig. 2 may be used to eliminate inexactness 











20 PAUL I. RICHARDS 


due to the approximation of (31) by (5) = (33). Otherwise, this error may 
safely be assumed to be negligible (less than 0.025; (31) may be employed di- 
rectly, but this is laborious unless Q/w is small). The remaining errors, given 
by (6), may change depending on how 7° is assumed to vary. To make these 
bounds as close as possible, it is best to choose JT. = Min(7,, 7:2) and then let 

oes 2! 














si Pee 
MELA 
AAA 
MIATA 
WAZ 
PMA 
WW 
SPEEA 
Z| see 


o s 4 ? 8 ’ 
Fic. 2. Contours of f/Q: —— (31); — — (33) 













7’: decrease from its nominal value by an amount sufficient to cause T,/T2 to 
increase by 3. 

The reader may have noticed that f has a jump discontinuity as ¢ passes 
through the value Q/w. This is not the result of approximations; it occurs also 
in the number-theoretic formula (excepting only when Max(71, T2) = 3w and 
(@ = 4) and merely means that the “lock in” phenomena are suddenly able to 
have an effect when ¢ becomes greater than Q/w. 


OO RE ORE 





coe 
oor 
A 
ih 
: i ae 
J 
ie 


Sa ee 


PROBABILITY OF COINCIDENCE 21 


3. The probability function. Our problem has already been represented by 
the pulse waves of Fig. 1. The starting phases ¢; , ¢2 of the waves are random, 
and we desire the probability P of at least one overlap of duration at least t,, 
within a given time interval. Manifestly P = 0 until time ¢,,; hence we shall 
give ¢t the meaning already assigned in (2). 

Consider any sub-interval of width ¢,. The range of phases favorable to 
satisfactory coincidence on this interval is easily seen to be a rectangle with 
sides (4: — tm), (t2 — tm) in the phase plane (¢;, ¢2). By proper choice of the 
(arbitrary) zero-phase reference, the small rectangle favorable to coincidence on 
(0, tn.) can be made to fall in the lower left corner of the phase plane (Fig. 4). 


LIL 
ie 
LLL 
ULL 





Fig. 3 


As we allow the sub-interval (width ¢,,) to advance in time, this small rectangle 
will sweep out along a 45° line (Fig. 4); its horizontal displacement = vert. disp. 
is given by ¢ as defined in (2). Since the phases must be measured modulo the 
periods, we must “switch back” the strip whenever it begins to leave the large 
rectangle: 0 < ¢: < 71,0 < ¢: < 72; this is illustrated in Fig. 5. 

The desired probability is then the area covered at least once by the strip 
divided by (717+), the total available area of the phase plane. 

Using Fig. 4, one can easily show that, before the strip begins to overlap itself : 


(8) P= Po+ wt, 


where t, Py , w are defined in (2). 
A rectangle with opposite sides identified, as in Fig. 5, is topologically equiva- 
lent toa torus. This gives a good geometric picture of the overlap phenomena. 











22 PAUL I. RICHARDS 


The strip winds diagonally about the torus until eventually (in general after 
several full circuits) it strikes sufficiently near its starting point to overlap itself 
on one edge. It then begins to fill the chinks between the previous circuits, and 
this single overlap continues until the chinks are almost filled. The strip then 
approaches its starting point from the side opposite to that on which single 
overlap occurred. Thereafter, only the center section of the strip is effective in 
increasing the area covered. This double overlap continues until the entire 
torus has been covered. A degenerate case is possible in which the strip, upon 
its first overlap, begins to retrace exactly its former path and the torus is never 
fully covered. This corresponds to interlocking of the original waves of Fig. 1. 

A rigorous proof of the above statements may be constructed by using the 
fact that each change in behavior can occur only at the starting point. In this 
manner, it is easily shown that: (a) single and double overlap occur in that order, 


s 


g 





Fic. 4 Fie. 5 


(b) the strip area effective in covering changes only upon a change in the type of 
overlap, and (c) the two types of overlap must occur on opposite sides of the 
starting point. 

The facts (a, b, c) may then be used to derive the probability function. For 
the analytic analysis, it is best to return to the (¢; , ¢2) plane. Overlap of any 
type will first occur when the ‘‘unswitched-back” strip approaches sufficiently 
near a point (m7: , n2T'2) where n; and ne are non-negative integers not both zero. 
The analysis is greatly shortened by noticing that the behavior is completely 
determined by the distance of the line ¢; = ¢: from such points (even though 
the strip is not centered on this line), while the width of the strip is (Fig. 4) 
wT 1T?/ VJ/2. 

A slight fine-structure may arise in the probability function where it changes 
slope, depending on whether or not the leading corner of the moving rectangle 
strikes one of the sides of the original small rectangle. These effects are small 
if t;/T; are small and will be neglected below by supposing the strip to be gen- 











PROBABILITY OF COINCIDENCE 23 


erated by a line segment oriented perpendicularly to its path. The error arising 
from this procedure consists essentially in a delay or advance in the time at 
which P changes slope. It may be seen that the maximum effect represents a 
delay of At = wT;T2/2. The error introduced is then less than At~/2 multiplied 
by that portion of the total width of the strip which becomes ineffective due to 
the overlap considered. The sum of these effects must be less than that given 
by using the total width of the strip; this gives the maximum error w’T;T2/2. 

The results of the method outlined are then as follows. Single overlap occurs 
at ¢ = s where 


(9) 8 = 3(mT, + mT), 


and (m, m2) is that pair of non-negative integers not both zero such that s is a 
minimum and 





_|m _ im 

(10) _ = T. T, < Ww. 
Double overlap occurs at ¢ = d, where 

(11) d = 3(mT1 + neT?), 


and (mm, 2) is that pair of non-negative integers not both zero such that d is a 
minimum and the conditions 














4 Ne 
T. T, | < WU, 
(12) 
ny ne m me 
(7 Gs r) — 
are satisfied. If we set 
(13) nom +i - Si-« 
2 T1 
the probability function is then 
= Py + wt fort < 8, 
(14) P=Po+swt+ (t—8)pi fors <t <d, 


=Po+swt+td—-—s)ni+(t—dm ford<t, 


where it is understood that P = 1 if (14) gives P > 1. 

The degenerate case where the waves interlock is given correctly by this for- 
malism. Namely, if the strip starts to retrace its path exactly, then p: = 0 
and the second part of (12) shows that d does not exist. Equation (14) then 
gives the correct result: P rises to the value Py + sw and never increases further. 


4. The method of smoothing. We have already discussed in section 1 the 
inadequacy of the formal mathematical solution (14) for purposes of practical 











24 PAUL I. RICHARDS 


application. Either mathematical analysis or intuitive consideration of inter- 
lock shows that the erratic behavior of P is due almost entirely to small changes 
in the ratio T;/T2. As this ratio passes through certain rational values, possi- 
bilities of interlock appear and disappear. Consequently, we next alter (14) 
to a form in which the dependence on this ratio is more evident. 

We may, without loss of generality, assume: 


(15) T.=1, <1. 
Also introduce the standard notation: 
(16) [x] = (largest integer < x). 
It will then be seen that (10) and (12) may be thrown into the form? 
(17) k = smallest positive integer such that p, = | ke — 1| < w (i = integer); 
(18) K = smallest positive integer such that | Ke — I | < wand also 
(ke — t) (Ke — I) < O (7 = integer); 


where either 


1 1 1 1 


Now from (9) and (10), we note that s differs from m7, by at most w7,T2/2, 
while from (11) and (12), d differs from 7,7; by less than the same amount. 
Moreover, by the second half of (12), d is thereby made too small if s has been 
made too large and vice versa. Hence the use of these approximations in (14) 
will contribute an error certainly less than w°7,T2/2. Adding the error dis- 
cussed in section 3, the total introduced thus far cannot exceed w°7T17° . 

We thus use in the present notation s = k,d = K; (13) and (14) then become: 


(20) p=n1+|Ke—I|—w 
(a) P = Po + wt, fort <k 
(21) (b) P 


Po t+kwt+ (t—k)p., fork <t<K 
(c) P=Po+kwt+ (K —k)pit (¢ — K)p, for K < t 


where, as before, P = 1 if (21) ‘gives a value greater than unity. Equations 
(17)-(21) are the formulation which will be used, with conditions (15), hence- 
forth. 

We wish now to smooth P with respect to variations in e. The number- 
theoretic requirement (17) is extremely difficult to work with. For reasons of 
simplicity, then, we shall assume that e is the only parameter which changes as 


2 Note that, even though the periods appear explicitly only in (19) hereafter, all the 
following equations are true only for 7, = 1. (This is evident if we recall that w has the 
dimensions of inverse time.) Thus we are definitely assuming that 7 = constant. 

















PROBABILITY OF COINCIDENCE 25 


T2 is varied. The errors which may arise from this assumption are treated at 
the end of section 5. 

From (19)—or from the absolute value signs in (17), (18)—it will be seen that 
all possible situations arise if e varies merely from zero to one-half. In order 
that this should entail as little variation in T,2 as possible, our conventions should 
be chosen as already stated in (15). Even under these circumstances, a maxi- 
mum variation of 33% in T2 may be required to cover the range e = 0 to 3. 

Equation (21) cannot be used directly without the interpretational convention 
there noted. This leads to difficulties of treatment which the author was unable 
to solve. The difficulties may be avoided by the following device, which ad- 
mittedly has less direct significance than an averaged value for P. 

We enquire after the fraction f of the range of e over which P has a value (at 
fixed t) less than some given value Q + Po. We may then say that, if a large 
number of trials each of length ¢ is made, then in f of them, the probability of 
coincidence will be less than Q + Pp. 


5. Calculation of f. The exceptional behavior of P is that caused by interlock 
possibilities. This corresponds to p; = 0 in (17). Thus the exceptional values 
of P center about the points e = z/k, where i and k are relatively prime (other- 
wise, k would not be the smallest integer satisfying (17)). Moreover, by a 
standard theorem [1], k < 1/w. Thus the critical points form the Farey series 
of order 1/w in the range (0, 3). About each Farey point, we may suspect that 
there will be an interval over which k is constant, and that the entire range may 
thereby be divided up into ranges of constant k. 

In thinking about the use of (17) in a typical calculation, it is convenient to 
eliminate the integer 7 by representing multiples of ¢ as a series of points pro- 
gressing around and around a circle of unit circumference. When e = i/k, the 
kth multiple will (after z revolutions) coincide with the origin; this and the 
earlier points, it is easily shown, will be distributed uniformly about the circle 
with a separation 1/k. 

As ¢ moves away from the Farey point, k will, by definition (17), remain con- 
stant until either (a) the point ke moves a distance greater than w from the 
origin or (b) an earlier point moves to a distance less than w from the origin 
(Fig. 6). 

Let (me) be that earlier point nearest (initially 1/k from) the origin and moving 
toward it as e varies in a particular direction. Of course, 

(22) m <k. 
For each Farey point, there will be two values of m; one for decreasing e and 
one for increasing e. If we introduce the new variable: h = the absolute value 
of the change in e from the Farey point 7/k, then each point, ne, on the reference 
circle will move a distance nh, and (17) gives as the conditions for constant k 
(Fig. 7): 

(a) w> kh = n, 


(b) mh < (1/k) — w. 


(23) 





26 PAUL I. RICHARDS 





Thus we have divided the range (0, 4) into small ranges where k (and m) are 
fixed. The number of small ranges is roughly twice the number of Farey points 
in (0, 4). 

Within each small range p,, K, pz still vary with e. The behavior of p; is 






37, 5 





2 ---€ =0.¢ 
: —€é =0.408 
W=0.025 
K =5=->37 









---@C=0.375 
—e=-0.382 

W=0.09% 
K=8->S 


Fia. 6 








already given in (23a); we shall find that we do not need pr. 
Fig. 7, it may easily be shown that: 


Using (18) and 









(24) K=m-+jkt+k, 

where 

(25) jt+a=(l — mkh—kw)/Fh, j=[j, O<a<1. 
From (23a), (24), (25), we obtain: 

(26) (K-k)p=1—kw-akh (<a<\)}). 


Having thus divided the range of e into small regions within each of which the 
number-theoretic requirements (17, 18) take a relatively simple form, we must 
now turn to the calculation of f = that fraction of the range e = (0, 4) over which 
P < Po + Q at fixed ¢. We shall specialize the further analysis to the case 
Q < 4. This considerably shortens the discussion and yields essentially all the 
useful results of the more general inquiry. 

We first note from (21) that, since p, < p; < w (i.e. because of (4)), we have 
P < Po + Q independently of e if t < Q/w 













PROBABILITY OF COINCIDENCE 


(27) j=1, for t < Q/w. 


Similar reasoning shows on the other hand that, when t > Q/w, those regions 
with k > Q/w do not contribute to f. In the following, we shall there- 
fore employ: 


(28) k<Q/w<t Q<2. 


Equation (28) implies that we must use either (21b) or (21c); we shall next 
show that we do not need (21c). The value of P whenever (21c) is applicable is 
certainly greater than (Po + kw + (K — k)p;). From (26), this value is equal to 
(Po +1—akh). Now from (28),w < 1/2k, whence by (23a) h < 1/2k < 1/2ak* 
(since a < 1). Thus (Po + 1 — ak*h) > Po + 3 > Po + Q, and consequently 
(2lc) never applies until P > Py + Q. (This means merely that the double 
overlap discussed in section 3 cannot occur until at least half the torus is covered.) 
Accordingly, we can confine our attention entirely to (21b) in any further dis- 
cussion of f. 


Substituting for p, from (23) and recalling that (t — k) is positive (by (28)), 
we find from (21b) that the condition P < Py + Q becomes: 


Q — kw 

k(t — k)° 

However, h is subject also to the restrictions (23), which insure that we do not 
stray from the small region where k is constant. We assert that (29) implies 
(23) and may therefore be used as the final expression of the requirement 
P<Po+Q. 

To prove this, note first that (29) and (28) immediately give h < w/k, which 
is (23a). Secondly, (28) implies 1/k > 2w so that, using (23a) and (22): 
(1/k) — w > w > kh > mh, which is (23b). 

Thus we arrive at the result that f receives contributions only from those 
elementary regions where k satisfies (28) and that the contribution of each such 
region is governed by (29). 

Since the variable h was defined as the absolute value of the change of e from 
the Farey point 7/k, each Farey point (satisfying (28)) contributes an amount 
equal to twice’ the right-hand side of (29). Since this amount is independent 


(29) h< 


3 This is not true of the Farey points 0 and 3, the ends of the range of e, but the terms 
k = 1,2 in (31) correctly account for these contributions since ¢(1) = ¢(2) = 1. 








28 PAUL I. RICHARDS 


of 7, we may immediately sum over all Farey points 2/k with fixed k. There 
are 44(k) such points’ in the range (0, 3), where Euler’s function ¢ is defined by: 


(30) o(k) = the number of integers < k and relatively prime to k. 


(Note that ¢(k) is even for k > 3 since if k and 7 have no common divisor > 1, 
neither do k and k — 2.) 

Thus, summing over all these contributions and dividing by the length of the 
total range: 


(31) f=2 > ok) oe for t > Q/w. 
1<k<Q/w kt = k) 

Regarding error in (31) due to the inaccuracy of (21), note that this can enter 
only when we set P = Py + Qin deriving (29). Actually the difference between 
(21b) and the correct value of P will change as e is changed so that there is con- 
siderable possibility that these effects will cancel out in (31). (In fact, a de- 
tailed study shows that the error in (21b) assumes opposite signs as e varies in 
opposite directions from any given Farey point.) In any case, because (31) is 
monotone in Q, the error in (31) can be no greater than that found by substi- 
tuting Q + wT;T. for Q. Taking account also of the variation of Po with 72, 
the same argument establishes the ““Q-dependence’’ of (6) given in section 2. 

Finally, we investigate the error due to change in w with T2._ If @ is the maxi- 
mum value of w, Farey points with k < Q/@ are certain to contribute to f, and 
this contribution will be at least as great as (Q — ki®)/k(t — k) so that f > f(®). 
On the other hand, if w is the minimum value of w, Farey points with k > Q/w 
cannot possibly contribute to f, and the remaining points can contribute no 
more than (Q — kw)/k(t — k) so that f < f(w). Hence we arrive at the final 
statement (6) in section 2. 


6. Approximations for f. Computational difficulties in the use of (31) sug- 


gested approximating it by a more readily computed expression. By a standard 
theorem [1, p. 266]: 


(32) o(k) = 6k/z’. 
We may then approximate (31) by: 





(Q/w)+4 — oe 
f = 1.216 / : in dk: 
1 t—k 
wa —— 
= 1.216Q (1 US uw wo), 
Q tw — 4w 
If Q/w is large compared to 3 (recall t > Q/w), this becomes very nearly: 
(33) f =1.216Q (1 + (= — 1) log € = ¢)) for t > Q/w. 


Despite the cavalier derivation of (33), its agreement with (31) is remarkably 











S~ 


PROBABILITY OF COINCIDENCE 29 


close. Fig. 2 shows a perfectly general comparison of (31) and (33), where the 
agreement will be seen to be fairly good even for t and Q/w of the order of 4 or 5. 
Note also that (33) nearly always gives a value of f that is too large. 

For completeness, we may repeat (27). 


(34) . fj=1 for t< Q/w. 


Note that only the dimensionless quantities tw, Q enter into (33, 34) which are 
therefore independent of the normalization (15). 


REFERENCE 


[1] G. H. Harpy anv E. M. Wrieurt, An Introduction to the Theory of Numbers, Clarendon 
Press, Oxford, 1938, p. 30. 











NONPARAMETRIC ESTIMATION, III. STATISTICALLY EQUIVALENT 
BLOCKS AND MULTIVARIATE TOLERANCE 
REGIONS—THE DISCONTINUOUS CASE 


By Joun W. Tukey 


Princeton University 


1. Summary. In Paper II of this series [2, 1947| it was shown that if n 
functions and a sample of n were used to divide the population space into n + 1 
blocks in a particular way, and 7f the joint cumulative of the functions were contin- 
uous, then the n + 1 fractions of the population, corresponding to the n + 1 
blocks, were distributed symmetrically and simply. 

In Paper I of this series [1, 1945] it was shown that the one-dimensional theory 
of tolerance regions could be extended to the discontinuous case, if equalities were 
replaced by inequalities. 

In this paper the results of Paper II will be extended to the discontinuous case 
with the same weakening of the conclusion. The devices involved are more com- 
plex, but the nature of the results is the same (See Section 5). 

As a tool, it is shown that any n-variate distribution can be represented in 
terms of an n-variate distribution with a continuous joint cumulative (in fact, 
with uniform univariate marginals), where each variate of the given distribution 
is a different monotone function of the corresponding variate from the continuous 
distribution. 


2. Introduction. The importance of extending the simple results of the 
continuous case to the more complex results of the discontinuous case may not 
be clear at first thought. Yet all the data with which the statistician actually 
works comes from discontinuous distributions. Often these distributions are very 
fine-grained—the distributions of the number of eggs laid by codfish and of the 
measured wavelengths of a spectral line (measured in 0.000001 A) do not have 
large concentrated probabilities, but all their probability 7s concentrated at dis- 
crete points. Insofar as the considerations of the theoretical statistician apply 
to the data as received rather than to the “data” of a more or less imaginary 
model, these considerations apply to data with a discrete distribution. When 
his theories are erected on a basis of a probability density function, or even a 
continuous cumulative, there is a definite extrapolation from theory to practice. 
It is, ultimately, a responsibility of the mathematical statistician to study dis- 
crete models and find out the dangerous large effects and the pleasant small 
effects which go with such extrapolation. We all deal with discrete data, and must 
sooner or later face this fact. 

In order to deal with the discontinuous case, we must face two problems: (we 
assume that the reader is familiar with Paper IT [2]) 

(1) What to do about ‘“‘ties’’? 

30 





NON-PARAMETRIC ESTIMATION III 31 


(2) Finite probabilities associated with cuts. 

The first of these is peculiar to the multivariate situation and can be easily ex- 
plained by an example. Consider the three points in the plane with coordinates 
(1, 9), (3, 9) and (2, 6). Let the first two functions be y and z, then the pro- 
cedure of Section 4 of Paper II [2] is not unique—two possibilities arise: 

Alternative A. (1, 9) is selected as having the largest y, and (3, 9) as having the 
largest x among the remaining (two) points, hence S; = {(z, y)|ly > 9}, S: = 
(xz, y)ly < 9,2 > 3}, Soa = {(x,y)ly < 9,2 < 3}. 

Alternative B. (3, 9) is selected as having the largest x, and (2, 6) as having the 
largest x among the remaining (two) points, hence S; = {(z, y)\y > 9}, S2 = 
((x,y)|y < 9,2 > 2}, Soa = {(a,y)ly < 9," < 2}. 

Notice that S; ~ S.. The procedure is not unique. In the continuous case, 
ties happen with probability zero, hence their consequences could be neglected. 
This is now no longer the case. 

This difficulty is solved by using more functions and the idea of lexicographical 
(like a dictionary!) ordering. {fn the simplest case, we add no new functions and 
proceed as follows: If there is a unique 7 for which ¢)(w;) is maximal, select it. 
Otherwise look among the w; for which ¢;(w;) is maximal—look at the values of 
go(w;:). If there is a unique such 7 for which ¢2(w;) is maximal, select it. If not, 
go on to ¢3(w;) --- . This procedure leads to a specific 7 unless g,(w;) = ¢n(we) 
for h and some j ¥ k. But in this case it does not matter whether 7 or k is 
selected, the set of m-tuples (¢:(w;), go(wi), +++ , ¢m(ws)) remaining will be the 
same, although the indices z will not. But the indices play no role in the actual 
construction. 

As an example, consider the following 20 four-letter words as a sample and let 
there be four functions—¢; being the negative of the position in the alphabet of 
the 7-th letter of the word. (Thusa>b>c>--:- >2z.) 

Sample: meet, west, made, gone, come, back, said, that, maid, well, with, with, 
just, week, very, near, edge, this, last, have. (The Law of the Three Just Men, 
Edgar Wallace, pp. 159-160). 

Selections: back, made, near, (gone, come, edge, have. The fourth selection to 
be made at random among these four.) The inferences which can be made about 
the four-letter words in Edgar Wallace’s writing vocabulary are left to the reader. 

We have just given one rule for breaking ties, one which chooses Alternative B 
in our example. But we might prefer a rule which chooses Alternative A. To 
get more generality, we have only to take M functions, 7 > m, and let gp , 
lGp2), *** »Ppim) » (Where we may suppose p(1) = 1 without loss of generality) 
play the role just taken by ¢1,¢2,---,¢m. Thus if the maximum of ¢:(w) is not 
unique proceed to go(w), thence to ¢3(w), --- , thence to ¢,,(w). For the second 
block, start with gp , then ¢pa@y41, Gp@)42,°'*>¢m- And so on. The choice 


gilt, y) = Y, 


g2(x, y) cag 











32 JOHN W. TUKEY 


ze", 


g3(X, y) 
gaz, y) — ee 
93(2, y) _ y’, 


with p(1) = 1 and p(2) = 4, leads to Alternative A above. (Note that ¢3 isa 
dummy in the sense that it is never used.) The problem of ties, which was a 
problem in uniqueness of construction, is thus dealt with. 

Next we must deal with the cuts. When we made S,, S:and S2),in Alternative 
A, we omitted some points, namely 


T, = {(z,y)|y = 9}, and T. = {(a,y)|y < 9,2 = 2}. 


In the continuous case this did not matter, since these sets had probability zero 
and could be avoided. Here they cannot, and we shall have to consider a family 
of blocks (in the wide sense) as consisting of the blocks S and the cuts T. The 
solution of the univariate case in Paper I [1] shows us that what we must expect 
is that: 


Pr { coverage S; + T;., + T; > t} > Pr { coverage of one 
continuous-case block > t} > Pr { coverage S; > ¢}. 


That is, if we want a certain set of blocks to cover (together) at least a certain 
amount with a certain probability we must add the adjoining cuts; and if we 
want a certain set of blocks to cover at most a certain amount with a certain prob- 
ability we may add only these cuts which do not adjoin blocks not in our set. 
By introducing the cuts explicitly, we solve the second problem. 

In order to reduce the size of the cuts, our detailed definitions will differ in 
detail from those which we have used sofar. In the example, where the functions 
leading to Alternative A are used; we place in S; not only the points with y > 9, 
but also those with y = 9 and —x > —1; we place in S, not only the points with 
y <Q9and x > 3 and the points with y < 9, x = 3, y’ > 49, but also those with 
y =9and —x < —1. Proceeding in this way, we reduce 7) to the point x = 1, 
y = 9and 7; to the point x = 3, y = 9. This reduction can only diminish the 
probability associated with the cuts, but we cannot be sure that it will reduce it 
to zero. 

Only in the quasi-trivial case, where the probability that all functions shall tie 
together is zero, do we return to the simplicity of the continuous case. This case 
is quasi-trivial because it does not arise with discrete probabilities, and real ob- 
servations always involve discrete probabilities. 

Having discussed the results, we should now briefly touch on the methods. 
The proof of the main theorems depends on two facts: 

(1) a representation theorem, (5.3), and 

(2) a lemma, (6.1) which shows that m functions would be enough if (i) the 
distribution were fixed, and (ii) cases of probability zero were neglected. The 


- 








NON-PARAMETRIC ESTIMATION III 33 


representation theorem has been outlined in the summary. It is analogous 
to, but a definite extension of the one used in Paper J [1]. It seems to be new in 
statement, though not in thought—it will surprise few probability theorists. The 
novel element is the monotonicity of the functions, which is utterly essential for 
our purposes. 

The lemma allows us to reduce the general case to the case of no extra func- 
tions, where the reduction must be made differently for each underlying distri- 
bution. The reduced functions are then represented by the representation 
theorem and the results of Paper II [2] are taken over. The results are stated 
in a form independent of the underlying distribution and the particular repre- 
sentation, hence they apply in general. 

The last paragraph stresses the principle common to Paper I [1] and this paper. 
It is natural to call it the “iceberg principle,” and to sketch it as follows: ‘““We 
have some information about the visible one-ninth of the iceberg, and we want 
to conclude something about this visible part. If we can imagine another eight- 
ninths, consistent with the part we know, and if using that we can prove some- 
thing expressed solely in terms of the visible part, then this is the required proof. 
(The only essential is to be able to match every visible part.)” Both the reduced 
functions (which depend on the underlying distribution) and the uniform vari- 
ables used to represent them are part of the invisible eight-ninths which “could 
be there.”’ 


3. Terminology and Notation. In general we use the terminology and nota- 
tion of Paper II [2], and we shall continue to assume that all functions concerned 
in the argument are measurable. 

Given two finite sequences of the same length, we write (a; , a2, +++ ,@m) > 
(b;, bo, «++ , bn) if any of the following hold: 


a, >b, 
a, = bi, and a > be, 


a, = b, , a2 = be, and a; > b;, 


b; fori < m, and dm > bm. 


a; 


This is the lexicographical order referred to above. (We interpret (a, a2.,---, 
Gm) < (bi, b2, +++ , bm) to mean (bi, bo, +--+ , bm) > (Qi, 2, °° , Gm) and = to 
mean identity.) 

3.1 DEFINITION: Given a sequence of real-valued functions 9, ,¢2,°-* ,¢m and a 
sequence of starting indices p(1), p(2), --- , p(m), (which we shall often refer to, 
briefly, as an m-system of functions, ¢1 , g2 , --- , ¢@ar , Without explicitly mention- 
ing the starting indices), the functions #, , &, , bs, --- , ®» are defined as follows: 


(3.2) Si(w) = {opm w), ope 41(w), «++ , eu(w)}, 











34 JOHN W. TUKEY 


the values of ;, being sequences of M — p(k) + 1numbers. (In these terms, the rule 
for tie-breaking already explained becomes “‘select an z for which ®,(w;) is max- 
imal (in the sense of lexicographical ordering)’’.) 


4. The blocks and cuts determined by 7» points. 4. DEFINITION: Given 
an m_-system of functions g1 , ¢2,°-: , gu and n points w; , We, +++ ,Wa, (m < n) 


the corresponding blocks and cuts are given by the following procedure: (the ®’s are 
defined in 3.1) First i(1) zs selected to maximize ®,(w;), when 


Si = {w|@(w) > (wi) }, 
Ti = {w| E(w) = &(wigy)}. 
Next, 1(2) is selected ¥ 7(1) and to maximize ®.(w;) among such 1, when 
S. = {w|O(w) < (wig), .(w) > $2(wigy)}, 
. {w|Pi(w) < b(wig), Bow) = bo(wigy)}. 


“a 
2° 
I 


tees eee (the construction is perfectly analo- 
gous to II-4.1) 


Sminga = (w|d(w) < &.(wig), k = 1,2, --- ,m}. 


4.2 DEFINITION: Jf m = n, then Syins1 ts also denoted by Sy+1. 

If m > n, then only ®, , 2, --- ,®, are used and Syjn-1 1s also denoted by S,4:. 

We denote by \ a subset (possibly none, possibly all) of the indices 1, 2, --- , 
m and m|n + 1 or, in case m > n of the indices 1, 2, --- ,n + 1. 

4.3 DeFtnition: The block-group By, consists of the union of all S; with i in 
and all T; with both 1 andi + 1 in X(m + 1 means m|n + 1). 

The closed block-group By consists of the union of all S; with i in X and all T, 
with either 1 ori + 1 inx. 

Given any set we define its coverage as the proportion of the population falling 
into it (here the underlying probability distribution appears for the first time in 
this section), and we use 

4.4 DEFINITION: The coverage of By is denoted by C(d) and that of By by C(d). 

Thus, given a family of functions ¢ and n points w, the space of the w is divided 
into blocks and cuts, these are joined together into block-groups, and these 
block-groups have coverages. Thus, if the family of functions is fixed, the n 
points determine these coverages, and, if the points are chance points, the cover- 
ages are chance numbers. 


5. Statement of results. Having discussed the construction, we can now 
state the results. 

(5.1) THEOREM aeun . Letgr,¢2,++* ,¢m be any m-system of functions and 
let W1, Wo,---, Wa, where m < n, be a sample from any distribution, let the 
blocks, cuts, block-groups and coverages be formed, as described above, using the 


NON-PARAMETRIC ESTIMATION III 35 


same (unknown) distribution for forming the coverages. Then, if a1, a2,°++ , Gp 
are any set of d’s (each 2 is a set of indices!), 


Pr {C(a1) < a, Clas) < a,-++, C(ax) > a, -++,C(ap) > ap} 
> Pr {t(ai) < a, t(ar) < a, ++, tan) > a, +++, tap) > ap}, 


where (A) = Xty for tin X, tmingt = tmar t +++ + tag, and t, f2, +++ , trax have 
a uniform distribution on the barycentric simplex. (Compare Theorem Ammjns1 of 
Paper IT [2].) 

In particular, 


Pr {C(t) < a} > Ia(1, n) > Pr {C(t) < a}, i = 1,2,---,m, 
where I,(1, n) is the incomplete Beta-function. 


(5.2) THEOREM as, Let g1, ¢2,°+* , om be any n-system of functions and 
let Wi, We,-°+: , Wa be a sample from any distribution. Then 


Pr {C(a) < a, C(a2) < a2,+++, Car) > ae, -+-, Clap) > ap} 
> Pr {t(a) < a, t(ae) < , +++, Han) > a, --- , Hap) > ap}, 


where (A) = Zt; for tin dX and t,, to, +++ , tn41 have a uniform distribution on the 
barycentric simplex. In particular, 


Pr {C(t) < a} > Ia(1,n) > Pr {C(i) <a}, t= 1,2,---,n+1. 


For convenience of reference, we also state the representation theorem as: 

(5.3) THrorEM C. Let X,, Xe, --+- ,Xn have any joint n-variate distribution. 
Then there exist (real) functions g:, g2, ++: , Gn and a joint distribution for 
U;, U2, +++, Un such that, 


(i) the marginal distribution of each U; ts uniform on [0, 1], 
(ii) each function g is non-decreasing, 


(iii) the distribution of gi(U1), go(U2), --+ , gn(Un) ts identical with that of 
Xi, Xe, oc * 9 aos 


6. The functions y. The aim of this section is to prove 

(6.1) Lemma. Given any m-system of functions ¢,, ¢2,-+-:, om, there exist 
real functions ¥1 , Yo, -+- , a such that, if Wi, W2,--- , W, are a sample from 
the distribution concerned: 
(6.2) Pr {¥i(W,) = ¥i(W:), but Pisn(W,) # Vias(Wr) for some hh > 0} = 0. 
(6.3) Pr {6:(W ;) has a different relation to®;(W;,) than that of ¥i(W ;) towi(Wx)} = 0, 
where by relation 1s meant >, =, or <. 

The y; will depend on the underlying probability distribution. Thus they are 
useful in the proof, but could not replace the 4; in the statement of the theorems. 

(6.4) Lemma. Let &(w) have its values in a totally ordered set, (i.e. always either 
b, < &, d, = & or d; > F) and let W have a distribution. Consider the function 
¥, 

¥(w) = Pr {@(W) < ®w)}. 








36 JOHN W. TUKEY 


Let W,, W2,--- , W, be a sample from the same distribution, then, with probability 
one, the relation (<, =, or >) between &(W;) and ®(W,) is the same as that be- 
tween y(W,) and y(W;,). 

If 6(w;) < &(w;), then ¥(w;) < ¥(wxz), if Y(w;) < ¥(ux), then b(w;) < B(w;). 
These follow directly from the definition. To prove the lemma, then, we must 
show that 

(i) Y(w;) = W(we) but b(w;) < &(wx) occurs with probability zero. 

We may clearly assume that the totally ordered set is complete, and that, in 
particular, it contains the symbols — © and +. Consider the real function of 
an abstract variable, 


F(s) = Pr {@(W) < s}. 
It is a monotone function, with F(— ©) = Oand F(+ 0) = 1. We can there- 
fore, given t > 0, select elements —~ = s% < 5, < & < +--+ <s = + such 
that 
0 < F(si41) — F(si +0) <e. 
If (2) occurs, then @(w;) and #(w;) belong either to the same open interval 


(81 , 8:41) or one belongs to an open interval and the other is its upper endpoint. 
The probability of either of these happening is at most 


n(n =D F(ei4s) — Flee + OY? + mt F(seu) — Flee + 0)} {F (sie + 0) — F(eue)}. 


Summing this over all intervals yields an estimate of 


nn =) Max {F(es) — Fl: +0} = "2&5, 


Since this goes to zero, the lemma is established. 

We turn now to the proof of (6.1). The system of functions ¢; , g2,--* , gw 
define the 4, , ®., --- , ®, according to Section 3. These define ¥,, y2,--- , 
Wm according to lemma (6.4) just proved. Applying this m times proves (6.3). 
Recalling that ©;(w;) = &;(w;) implies $;4,(w;) = ®i4,(wz), we see that (6.3) 
implies (6.2). 


7. The notation F(z + -0). All practitioners of analysis are familiar with 
F(x + 0) and F(x — 0), defined by 


F(x +0) = lim F(z +h). 
ho 


We now generalize this formal notation to 


1+,” 


(7.1) F(z + 2-0) = —" F@ +0) + > F(x — 0), 


where we will, in our immediate applications, need only \’s between —1 and +1 





NON-PARAMETRIC ESTIMATION III 


(although the definition applies in general). Notice, for example, that 

F(a — 0) < F(x + .-0) < F(x +0), for —-1 <A <1, 
that if F is continuous at z, 

F(z + d-0) = F(x + 0) = F(z), 
that the condition for F to be normalized is 
F(z + 0-0) = F(z). 
A similar definition is made for functions of two variables, namely 

P(e +2-0,y + w-0) = -~2* F(x + 0-0, +0) +25" F(x + 2-0, y — 0) 


_1+n% 
a 


F(x + 0,y + u-0) +-> Fe - 0,y + 4-0), 


where the two right-hand sides are equal if, as is the case for cumulatives, all 
doubly one-sided limits exist. 

If F(x, , x2) is the joint cumulative of two variates, then, when all ordinates 
and abscissas involved are ordinates and abscissas of continuity, 


Pria<2z<b,c<y <d} = F(b,d) — F(b,c) — F(a, d) + Fla,c) > 0. 
Passing to the limit in assorted ways, and taking linear combinations gives 
F(b + w-0,d + p-0) — F(b + u-0,¢ + v-0) 
— F(a + -0,d + p-0) + F(a + A-0, b + v-0) > O, 


for —~ <a,b,c,d < + and —1 < A, u,»,p < 1. This will be of use 
shortly. 


8. The representation theorem. It was shown in Paper I [1] of this series, 
that the uniform distribution on [0,1] could serve as the prototype of any variate 
—that is, that given a distribution, there is a monotone function g, so that g(U) 
has the given distribution, where U has the uniform distribution on [0, 1]. 
(In Paper I, U was denoted by X*). 

In the notation of the last section, there is a function \ (uw), with | A (u) | < 1, 
so that 


(8.1) F(g(u) + A(u)-0) = u, 


for all wu. (We may, and shall, require that g(u) = —, for u < 0, and g(u) 
= +o for wu > 1). It is easy to see that g(u) is unique except on a set of 
probability zero and that A(u) is unique (and in fact linear) on each open interval 
which contains no value of F(z). 

Each cumulative F(x), then serves to define g(u) and A(u) by the equation 











38 JOHN W. TUKEY 


(8.1). Two or more independent variates can be thrown back on a set of inde- 
pendent uniform variates by applying this process to their cumulatives separately. 

Our present problem is to prove Theorem C (5.3), which applies to variates 
X,, X2,---, X, which need not be independent. Let Fi(z;) be the (marginal 
cumulative of X;, and use (8.1) to define g;(u;) and \;(u;). Then define the 
joint distribution of U,, U2,--+, Un by 


G(u, U2,°**, Un) = F(gi(ur) + Ar(u)-0,--- , gn(tn) + An(tén) 0), 


where F(x; , 22, ++: , Xn) is the joint cumulative of the X; , X2,--- ,Xn. 

We shall verify that this is the desired distribution in the case n = 2, leaving 
the general case to the reader. Consider G(u,, + ©) = G(m, 1) = F(gi(u) 
+ r, (w)-0, + ©). This is a cumulative, and so is G(+ ©, uw). In fact, 
using (8.1) they are each the uniform cumulative 


(0, u < 0, 
G(u) ={u0<u< - 
(1,1 <u. 


By (7.2) all second differences are positive, and hence G(u; , uw) is a joint cumu- 
lative. Since its marginals are uniform, it is continuous. 
Finally, 


Pri{gi(Ui) < 81, g2(U2) < 82} = G(F(s, — 0, +), F(+ a 0)) 
= F(s; = 0, ae 0), 


since gi(tw) < s; is equivalent to u < F(s; — 0, + ©) and go(ue) < se is equiva- 
lent to uw < F(+ 0, s. — 0). Thus g;(U:) and go(U2), have the given bivariate 
distribution. 


9. Proof of main theorems. We come now to the proof of Theorems eos 
and B+,,;, and we begin with Axjn4i. According to Lemma (6.1), the various 
indices, 7(1), 2(2), ..., 7(m) selected to determine the blocks will be the same, 
excluding cases of probability zero, whether the ; or the y; are used. Consider 
the first block, which takes the forms: 


Si = {W | &,(W) > , (wi) }. 
Si = {W|nW(W) > vilwia)}. 


Another application of Lemma (6.1) shows that these sets differ by a set of 
probability zero, and hence their coverages are identical. It will thus suffice to 
prove theorem | a for a fixed underlying distribution and the corresponding 
Vi, v2, eo > Vm - 

According to Theorem C (5.3), the m-variate distribution of the ¥,(W) can be 
represented in terms of uniformly distributed variates U, , --- , U,,and monotone 
functions g:(U;), --- , gm(Um). Now Ui, Ue, +++: , Um have a continuous joint 


| TTT TTT LT RES IT TIT me 


> Re Te 











NON-PARAMETRIC ESTIMATION III 39 


cumulative, so that theorem A,,|,4: applies to a sample of n drawn from this m- 
variate population, with the coordinates themselves as the m functions. We shall 
denote the coordinates of the 7-th element of this sample by 1(7), --- , Um (2). 
Consider the first block, 


Si: = ((Ui,--+, Um) | Ui > w(i(1))}. 
Its image, g(S) = {(g:(U1) , +++ 5 gm(Um)) | Ui > w(i(1))} 


contains 


es = {(m(Ui), eS 9m(U m)) | g(Uy) > g(ur(z(1)))}, 


‘ ‘ ‘ ‘ ad * 
and is contained in the union of S; and 7, where 


S = {(g:(U1), +++ » gm(Um)) | g(Ui) = g(u(i(1)))}. 


Thus the conclusions of Theorem Sones hold for S} : °° + oni : 
Now while Theorem A>,|,,; mentions the underlying W’s implicitly, careful 
study shows that they are not really involved; only the joint distribution of the 
¢: , Which in our present case are the y; , matters. Since this is the same for the 
¥i(W) and the g;(U;), Theorem as must hold for the y; and the theorem 
is proved. 
Theorem B74, is again a special case of Theorem Roose i 


REFERENCES 


(1] H. Scuerr& anp J. W. Tuxey, “‘Nonparametric Estimation I. Validation of order 
statistics,’ Annals of Math. Stat., Vol. 16 (1945), pp. 187—192 (Also cited as 
Paper I). 

[2] J. W. Tuxey, ‘‘Nonparametric Estimation II. Statistically equivalent blocks and mul- 

tivariate tolerance regions. The continuous case,” Annals of Math. Stat., Vol. 

18 (1947), pp. 529-539 (Also cited as Paper II). 





ASYMPTOTIC PROPERTIES OF THE MAXIMUM LIKELIHOOD 
ESTIMATE OF AN UNKNOWN PARAMETER OF A DISCRETE 
STOCHASTIC PROCESS 


By ABRAHAM WALD 


Columbia University 


Summary. Asymptotic properties of maximum likelihood estimates have 
been studied so far mainly in the case of independent observations. In this 
paper the case of stochastically dependent observations is considered. It is 
shown that under certain restrictions on the joint probability distribution of the 
observations the maximum likelihood equation has at least one root which is a 
consistent estimate of the parameter @ to be estimated. Furthermore, any root 
of the maximum likelihood equation which is a consistent estimate of @ is shown 
to be asymptotically efficient. Since the maximum likelihood estimate is always 
a root of the maximum likelihood equation, consistency of the maximum likeli- 
hood estimate implies its asymptotic efficiency. 


1. Introduction. Let {|X;}, (¢ = 1, 2,--- , ad. inf.), be a sequence of chance 
variables. It is assumed that for any positive integral value n the first. n chance 
variables X,,--- , X, admit a joint probability density function p,(2x,, -- 
Zn, 9) involving an unknown parameter 6. The consistency relations 

+0 
(1.1) i. Pn41(1 5 °° * y Lng, 8) Ansa = Pn(X1,°°- » tn, 8) 


. 
’ 


are assumed to hold. 

In what follows, for any chance variable u the symbol E(u | 6) will denote the 
expected value of wu when @ is the true parameter value. 

Let t,(a1,--- , %n) be an unbiassed estimate of 6. Cramér [1] and Rao [2] 
have shown that under some weak regularity conditions on the distribution 
function p,(%1,--- , 2%, 8), the variance of ¢, cannot fall short of the value 


1 1 
(1.2) cn(0) E (2 log Ps) | a] 
06 | 


Thus, for any unbiassed estimate ¢, the variate +/ Cn(0)(tn — 6) has mean value 
zero and variance 2 1. An estimate ¢, is called efficient if Vc,(@)(t, — 0) has 
mean value zero and variance 1. 

A sequence {t,}, (n = 1, 2,--- , ad. inf.), of estimates is said to be asymptot- 
ically efficient if the mean of ~/c,(6) (tn — 9) is zero and the variance of ~/c,(6) 
(t, — 6) is 1 in the limit asm — «. In the literature usually the additional re- 
quirement is made that the limiting distribution of —c,(6) (ta — @) be normal. 

40 





ASYMPTOTIC PROPERTIES 41 


To make a distinction between the two cases when the condition concerning the 
limiting distribution of Wc,(6) (t, — @) is fulfilled or not, we shall say that {t,} 
is asymptotically efficient in the wide sense if it satisfies the conditions concern- 
ing the mean and the variance of ~/c,(6) (t, — 0). If, in addition, the limiting 
distribution of ~/c,(6) (t. — 9) is normal, we shall say that {t,} is asymptot- 
ically efficient in the strict sense. Clearly, if {t,} is asymptotically efficient in 
the strict sense, it is also asymptotically efficient in the wide sense. 

A word of clarification is needed as to the meaning of the conditions concern- 


ing the mean and variance of —c,(@) (¢, — 4). One interpretation would be 
that the requirement is that 


(1.3) lim E[Vc,(6) (tn — 6) | @] = 0 
and 


(1.4) lim E[e,(@) (tn — 0)” | 6] = 1. 


Another interpretation would be that the requirement is that the limiting dis- 
tribution of Vc,(6) (t, — 6), provided that the limit distribution exists asn — ©, 
should have zero mean and unit variance. These two interpretations are cer- 
tainly not equivalent. It seems to the author that the mean and variance of 
the limiting distribution is more relevant than the limits of the mean and the 
variance. We shall, therefore, adopt the following definition of asymptotic 
efficiency: 

Definition: A sequence {t,} of estimates is said to be asymptotically efficient 


in the wide sense if a sequence {un}, (n = 1, 2, --- , ad. inf.), of chance variables 
exists such that 


(1.5) lim E(u,|0) = 0, lim E(u, | 6) =1 


T= 


(1.6) V Ca(0)(tn — 0) — Un 


converges stochastically to zero as n — «. If, in addition, the limiting dis- 
tribution of —/c,(6) (t, — 6) exists and is normal, {t,} is said to be asymptotically 
efficient in the strict sense. 

The reason that a sequence {u,} of chance variables is considered in the above 
definition, instead of the limiting distribution of ~/c,(@) (tn — @), is that the exist- 
ence of a limiting distribution of Vc,(6) (tn — 4) is not postulated. If a limiting 
distribution of ~/c,(6) (ta — 6) exists and if this limiting distribution has zero 
mean and unit variance, a sequence {u,} of chance variables satisfying the con- 
ditions (1.5) and (1.6) always exists. This can be seen as follows: Let T,, denote 
the chance variable ~c,(6) (tn — 8) and let F,,(¢) = prob. {7,, < t}. Ifa limit- 











42 ABRAHAM WALD 


ing distribution of 7, exists and if this limiting distribution has zero mean and 
unit variance, then 


(1.7) im | tim f (AF (| =0 and im | tim f e a(t | wi. 


From (1.7) it follows that there exists a sequence {a,}, (n = 1,2, --- , ad. inf.), 
of positive values such that the following conditions are fulfilled: 


an an 
(1.8) lim tdF,(t) = 0; lim | tf? dF,(t) = 1; lim Prob {| T,| > a,} = 0. 
nat® dm ty ast tte neve 
Let uw, be a chance variable which is equal to 7, whenever | 7, | S an , and equal 
to zero otherwise. Clearly, the sequence {u,} will satisfy conditions (1.5) and 
(1.6). 

In the following section we shall formulate some assumptions concerning the 
probability density function p,(a,,---,2n, 4). It will then be shown in sec- 
tion 3 that there exists a root of the maximum likelihood equation 


0 log Dn 
1.9 ——— «= & 
(1.9) a0 


which is asymptotically efficient at least in the wide sense. 


2. Assumptions concerning the probability density p,(x,,---,X,, 9). We 
shall assume that there exists a finite non-degenerate interval A on the 6-axis 
such that the following conditions hold: 

is on a il O'Dn >. P . . 

Condition 1. The derivatives a , (2 = 1, 2,3), exist for all 6 in A and for all 
samples (21, --* , {n) except perhaps for a set of measure zero. We have fur- 
thermore, 


+00 +20 | a \ ; 
(2.1) [ - [ bade [te da ~+« de, < 0, inane | 
06 Lo Gea | O08 | 
Condition 2. For any @ in A we have lim c,(@) = «. 


Condition 3. For any @ in A the standard deviation of =P divided by the 





> 


a log Dn 


expected value of -=— ( both computed under the assumption that @ is true). 





converges to zero as n — ©, 
Condition 4. There exists a positive 6 such that for any 6 in A the expression 


1 | 83 log pa(a1,°**,2n, 6’) | | | 
92 E| Serial naenilingtin 
(2.2) ca(6) Lu.b. | agra | 6 





is a bounded function of n where 6’ is restricted to the interval | 6’ — @| S 6. 
In what follows in this section, as well as in section 3, the domain of @ will be 





he 


e), 


on 


be 


ASYMPTOTIC PROPERTIES 43 


restricted to interior points of the interval A unless a statement to the con- 
trary is explicitly made. 
Clearly 


, (9 log pa 
(2.3) E € _ 





+o +° an 
=f ve fe Pe dey ++ diy. 


It follows from Condition 1 that 


+00 +2 4 +00 
(2.4) [ +f Pe diy +++ din = & [ Pn dix, -++ dz, = 0. 

















«o 6 
Hence, 
(2.5) E (Pee ) = 0. 
We have 
(2.6) d log Pe _ 1 Ops _ (aaey 
ae? Pn 06? a) 
Hence 
" bgt | 0) (2 = ) - 
(2.7) p (ges | 6)=E pa Oe 6 (8). 
But 
2.8) B (42? |6) =o 
( - Dn 062 > 





because of Condition 1. From (2.7) and (2.8) we obtain 


a” log Dn 
(2.9) E ( age 








s) = —c,(6). 


Conditions 3 and 4 will generally be fulfilled when the stochastic dependence 
of x; on x; decreases sufficiently fast with increasing value of |¢ — j|. For, in 
such cases, the following order relations will generally hold: The standard devia- 
d log pa 


tion of age 





will, in general, be of the order ~/n, the expected value of 


d° log pn 
lu.b V3 


|@’—-6| <8 


will usually be of the order n, and =f6 will generally have a positive lower 








bound and a finite upper bound. 











44 ABRAHAM WALD 


3. Proof that the maximum likelihood equation has a root which is an asymp- 
totically efficient estimate of 9 (at least in the wide sense). Let 6) denote the true 
parameter value and let @ be any other value. We put 


d° log Da 











(3.1) oboe Po, log Pn _ ®, and = &,. 
00 06? 068 
Expanding ®,(2 , --- , Zn , 6) in a Taylor expansion around @ = 6 we obtain 
Balti, +++ ny 9) = Balti, +++ Zu yO) + (0 — O)Pn(ti, +++ 5 Zn yO) 
(3.2) + (8 — 0)°®n(ar,--- , ny On) 


where 6% is some value between 6 and 0. Dividing both sides of (3.2) by en() 
we obtair 
®,(21, es » Xn, 9) ®,(%1,°°° » Xn 9%) 
A) ee (ee 
®,(21, Ree » te 
Cn(9o) _ 


From Condition 3 and equation (2.9) it follows that 





(3.3) 


* 
2 0" (a1, °°* , Za, Gn) 


+ (8 — 6) c.(6s) 


+ 3(9 — 4%) 


. . ®, (x1 =o Un A) 
3.4 lim. ———___ —” = 


where the operator plim stands for convergence in probability (stochastic con- 
vergence). 


-| 





According to equation (2.5) the expected value of ©,(a1 , --- , 2, , 9) is zero. 
Since the variance of ®,(%7, , --- , T , %) is equal to c,(4), and since 
lim c,(@) = ©, we have 
n= 


By(%1 °° * 5 Ln, M) 


3.5 lim ———__\—’— = 0. 
( ” = Cn (0) 
It follows from Condition 4 that for any @ with | 6 — 6 | S 6 we have 
1 — 2 ‘ 
(3.6) —— E(| ®n(a1,°°* , Xn, On |) = O(1). 


Cn(Oo) 
According to Markoff’s inequality the probability that a positive random 


variable will exceed A-times its expected value is not greater than -. Hence, 


it follows from (3.6) that for any e > 0 we can find a positive value k, such that 
(3.7) lim sup Prob tes | Ba(ti,-°*, tn, On) iz ha} Se. 
n\ YO 


1 mm00 C 


Let p be any given positive number. The probability that the maximum 
likelihood equation 


(3.8) , (21, °°* , tn, 4) = 0 








ASYMPTOTIC PROPERTIES 45 


will have a root in the interval (0% — p, 6 + p) converges to one asn — ©. 
This follows easily from (3.3), (3.4), (3.5) and (3.7). Thus, we have shown that 
the maximum likelihood equation has a root 6, which is a consistent estimate, 
i.e. it satisfies the relation 


(3.9) plim (6, — 4) = 0. 


We shall now show that if 6, is a root of the maximum likelihood equation 
(3.8) and if 6, is a consistent estimate, then 6, is also asymptotically efficient, 
at least in the wide sense. For this purpose we substitute 6, for 6 in (3.3) and 
multiply both sides of the equation by ~/c,(0)). We then obtain 








®, (41, +++ , Ln, M) vitieaiainn ait @,, (2; occ A) 
(= ene GO) eda 
(3.10) V Cn (80) + Vext 0) ( ) Cn (4) 
+ V cn(8o) (6, — A)? Un 
where 
LON (a1, +++, tn, On) 
(3.11) = 3  —_ 
Let 
Pn(Z1, +++» Zn y M) — 
(3.12) a Val and Zn = Vcn(0) (On — %). 
Then (3.10) given 
oa = @, (21, °°* , In, Oo) _—— 
(3.13) Yn = 2n —_— + Zn(O, — Oo) Un. 
It follows from (3.7) and (3.9) that 
(3.14) plim (6, — 4) vn = 0. 


From (3.4), (3.13) and (3.14) we obtain 


(3.15) — Yn = 2n(— 1 + En) 
where 
(3.16) plim & = 0. 


Since Ey, = 0 and Ey’, = 1, it follows from (3.15) and (3.16) that 
(3.17) plim (2, — yn) = 0. 
The asymptotic efficiency (in the wide sense) of 6, is an immediate conse- 


quence of (3.17). Our main result may be summarized in the following theorem: 
THroreM. [f the true value of the parameter 6 is an interior point of an inter- 











46 ABRAHAM WALD 


val A satisfying the conditions 1 — 4, then the maximum likelihood equation (1.9) 
has a root' which is a consistent estimate of 0. Furthermore, any root of (1.9) 
which is a consistent estimate of 61s also asymptotically efficient at least in the wide 
sense. 

Since the maximum likelihood estimate is a root of (1.9), it follows from the 
above theorem that whenever the maximum likelihood estimate is consistent, 
it is also asymptotically efficient at least in the wide sense. 


REFERENCES 
[1] H. Cramétr, Mathematical Methods of Statistics, Princeton Univ. Press, 1946. 
[2] C. R. Rao, “Information and the accuracy attainable in the estimation of statistical 
parameters”, Bull. Calcutta Math. Soc., Vol. 37 (1945). 


1 The probability that (1.9) has at least one root converges to unity asn > ©. 


DISTRIBUTION OF A ROOT OF A DETERMINANTAL EQUATION 
By D. N. Nanpa 


Institute of Statistics, University of North Carolina 


1. Summary. S. N. Roy [2] obtained in 1943 the distribution of the maxi- 
mum, minimum and any intermediate one of the orots of certain determinantal 
equations based on covariance matrices of two samples on the null hypothesis 
of equal covariance matrices in the two populations. The present paper gives 
a different method of working out the distribution of any of these roots under 
the same hypothesis. The distribution of the largest, smallest and any inter- 
mediate root when the roots are specified by their position in a monotonic ar- 
rangement has been derived for p = 2, 3, 4, and 5 by the new method. The 
method is applicable for obtaining the distribution of the roots of an equation of 
any order, when the distributions of the roots of lower order equations have been 
worked out. 


2. Introduction. If x = || x;;|| and z* = || ti || are two p-variate sample 
matrices with n; and ne degrees of freedom respectively, and S = xz’/m and 
S* = x*x*’/ne are the covariance matrices which under the null hypothesis are 
independent estimates of the same population covariance matrix, then the joint 
distribution of the roots of the determinantal equation |A— @(A+ B)| = 0 


where A = n,S and B = n.S* has been obtained by Hsu [1] in 1939. The dis- 


tribution densty is 
oT] p(iteteti- *) 
Ri, My, v) ee = 


II r(etin) r(2+i—") T'(i/2) 
t=l “ 


l l 
Tho TT a — 0) T (0, — 8), 
t<7 


i=l t= 
056 56415 --:-4 51), 


where ] = min. (p, m),u = |p —m|+1landy=m—p+l. 

This formula also gives the joint distribution of the squares of canonical cor- 
relations on the null hypothesis, that the two sets of variates are independent 
\1]. If 





48 D. N. NANDA 


are the observations on the two sets of canonical variates and the 2’s are nor- 
mally distributed, independently of the w’s, then the equation for the canonical 
roots is | VixVuuwVw2 — OVez| = 0, where 0; = r; and V., = XW ete 

It is observed that ViwVuwV wz is like A with n; = q and V.z — VewVowV we i8 
like B with ne = N — q — 1 and the above equation is reduced to the form 
|A — oA + B)| =0. It is under this condition that R(I, u, v) gives the joint 
distribution density of rj, 72, --- , 71, where 1 = min. (p,q), u = |p —q| +1, 
and y = N — p— gq. 


3. Notation and preliminaries. 
(a). Let 


II (6: — 0) = {1,2,3,--- Uj. 
i<j 


It is known that the value of the Vandermonde determinant 


C 62" Os «=: 
is equal to [J (6; — @;) = (—1)'{1, 2, 3, --- 
Then a 
i 1 1 | 
61 82 83| = (02 — 01)(@2 — &)(0s — 1) = —{1, 2, 3}, 
\6; 62 63 | 
but the determinant can also, by expansion in minors of the first row, be ex- 


pressed as 
—[6,02{1, 2} + 0203{2, 3} + 636:{3, 1}] 
where 
6, — 0 = {1, 2}. 

Hence 
(2) 11, 2,3} = 6,02{1, 2} + 6:0:{3, 1} + 6205{2, 3}. 
Similarly 

{1, 2, 3, 4} = 6:6263{1, 2, 3} — 40:62{4, 1, 2} 
+ 036,0:{3, 4, 1} — 620;6,{2, 3, 4}, 





ROOT OF A DETERMINANTAL EQUATION 


and 
11,2, 3, 4,5} = 0:626304{1, 2,3, 4} + 050:0203{5, 1, 2,3} + 040:0:02{4, 5, 1, 2} 
+ 630:0:0:{3, 4, 5, 1} + 620:005{2, 3, 4, 5}. 


It is seen that in the successive terms the 6’s are present in a decreasing order. 
(b). Let 


(a, b; m,n) = y"(1 — y)” 2 = b™(1 — B)” — a™(1 — a)”, 


(4) 


and 


b 
(a, 1,b; m,n) = | y"(1 — y)”" dy; 
then 
(a, 1,6; m + 1, n) 
_ _ (a,b;5m+1,n + 1) 


m+tn+ 2 m+n+2 
by a combination of the transformations obtained by partial integration and by 
breaking up (1 — y)"™ into (1 — y)" —y(1 — y)”. 

(c) Let 


(a, 1, b; m, n); 


(a, 2,1, b; m,n) = | 


a<62<h1< 


; (0:02)"(1 — 0:)"(1 — 62)"{1, 2} dO, dé, 


(a, 2, b, 1,¢;m,n) = | 


a<02<b< 61, <e 


(0102)"(1 — 6,)"(1 — 62)"{1, 2} 0; dé, 
and 


(a, 3, b, 2,c, 1,d;m + 1, n) 


401620)" — 6:)"(1 — 62)"(1 — 63)"{1, 2,3} dO; dB2 dds. 


lo. <b<O2<c<bi< 


(d) Let 


Dm gn es P m io n 
Te" gy) = f yra y)" g(y) dy, 


T?""(0, y; k,l) = (a,1,b;m+kj3n+)D, 
and 
Te™*"(b, 1, ¢;k, 1) = (a, 1, b; m, n)(b, 1, ¢; k, 1). 


With these preliminaries we proceed to derive the distribution of the roots. 











50 D. N. NANDA 


4. Distribution of the largest root. Let us suppose that the roots are arranged 
in decreasing order such that for / roots we have 


| 0 <6 < O11 <M%2,°*-,<&<A<1. 
If the distribution density R(l, u, v) given by (1) be expressed as 


R(i, m,n) = C(l, m, II 0” II (1 — 0)" [I 6; 


t=] seg 


then the distribution of the largest root in the general case would be given by 
Pr(@ < x) = C(l, m, n)(O, 1,1 — 1, --+ , 2,1, 2; m,n). 


Now we shall derive the distribution of the largest root for 1 = 2,3, 4, and 5. 
(a) l = 
Pr(@, S x) = C(2; m, n)(0, 2, 1, x; m, n). 


(0, 2,1, 2; m,n) = (010)"(1 — 6:)"(1 — 62)"{1, 2} dO, db. 


0<62<9,<z 


‘i | er'(1 — 62)" %(1 — 6:)"{1, 2} dd, dO. 
0<02<9i<z 

= [ 63°(1 aa 6: err — 6,)” dé, d0. 
0<b0<9,<2 


— | 63'(1 — 62)" 6y'*"(1 — 0)" dO; dB. 
0<0;<be<z 


The limits in the successive integrals are to be so adjusted as to keep the inte- 
grand same. Then using the notation given in section 3(d) and equation (4, a). 


(5) (0,2,1,2;m,n) = To" (y, 1,2;m + 1,n) — To", 1, y;m + 1, n) 








or j 
i 
i er (y,z;m + 1,n + 1) m+l i. aa : 
(0, 2,°1,.2; m,n) = T9 E atatt at+ate (y, 1, 2; m, n) 
(0, y;m+1,n+1) (m + 1) |. 
+ — m+tn+2 — mbna O bys m,n) 


Now by a change in the order of integration, 
To" "((0, 1, y; m, n) — (y, 1, x; m, n)] = 0. 
Therefore 
(m + n + 2)(0, 2, 1,2; m,n) = To'™'"[2(0, y; m + 1, + 1) 
— (0,y;m+1,n + 1)} 
= 2(0, 1, 7; 2m + 1, 2n + 1) 
— (0,2;m + 1,n+ 1)(0, 1, 2; m,n). 


ROOT OF A DETERMINANTAL EQUATION ‘61 


Hence 

Pr(@, S x) oe ee 
C(2, m, n) 

= ——~— — [2(0, 1, 2; 2m + 1, 2n + 1) — (0,23m + 1, n+ 1)(0, 1,2; m, n)] 
mtn+2 

im a 7 2m+1 _— 9,\2%+1 

= 012, m,n) {> yn (Ll — yy dy 


mai Cn n+1 2 | ne 
“Saar ey 


(b) 1 = 3. For this case we need certain results for 1 = 2 which can be easily 
obtained and are given below: 
2 


(6) (a, 2, 1,6; m,n) = atone 


(a, 1, b; 2m + 1, 2n + 1) 


1 
— Ey lO asm + 1,n +1) + 0, b;m+1, +1] X (a, 1, b;m,n) 
and 


1 
m+n+2 


+ (0, b; m + 1, n + 1)(a, 1, c; m,n) — (0,c;m + 1, + 1)(a, 1, b; m, n)). 


Now 


(a, 2, b, 1, c; m,n) = [—(0, a; m + 1, n + 1)(b, 1, c; m, n) 


(0, 3, 2, 1, x; m, n) 


| (01 0203)"(1 — 0:)"(1 — 2)"(1 — 63)"{1, 2, 3} dO, dO, dd, 
0<63<02<0;<z 


= [ (01 G2 O5)"(1 — 61)"(1 — 62)"(1 — 65)"[0,02{1, 2}°° 
0<03<02<0, <2 


+ 6;0:{3, 1} + 626,{2, 3}] dé, dé. dé; 
(using equation (2)) 


[ e(1 — 65)"(6,45)"(1 — 0,)"(1 — 6:)"{1, 2} dO, dd, 
0<63<62<6;<z 


? 
i. <03<62<2 Settee 
or 


(0, 3, 2, 1,2; m,2) = T"""(y, 2,1,2;m-+ 1,n) 
+ T”'"(0, 1, y, 2,2;m + 1, n) 
+ Ty”'"(0, 2, 1, y;m + 1, n), 











52 D. N. NANDA 


but the 6’s are to be always arranged in the same order, hence 
(0, 3, 2, 1, 2; m,n) = To'”'"(y, 2, 1,2; m + 1, n) 
— Ty'"'"(0, 2, y, 1, 2; m + 1,n) 
+ To'™'"(0, 2, 1, y;m + 1,7). 
Using equations (6) and (7), we have 
(0, 3, 2, 1, x; m, n) 


_- 
~~ m+n+3 


X [(0, y;m + 2,n +1) + (0,2;m+2,n + 1)] 
— (0,1,2;m+1,n)(0,y;m+2,n +1) + (0, 1,y;m+1,n)(0,2;m + 2,n +1) 
+ 2(0, 1, y; 2m + 3, 2n + 1) — (0,1, y;m+1,n)(0, y;m + 2,n + 1)} 


7 


* Say Mohs: 2m + 3, 2n + 1) + (0, 1, y; 2m + 3, 2n + 1) 


— (0, y;m + 2,n + 1)[(0,1,2;m+1, n) + (0, 1, y; m + 1, n) 
+ (y, 1,2;m +1,n)] — (0,2;m + 2,n + 1) 
[(y, 1,2;m +1,n) — (0,1, y;m + 1,n)]} 


To" 
~ m+n +3 


— 200, y;m + 2,n + 1)(0, 1,2; m + 1,n) 
— (0,7;m + 2,n + I[(y, 1,2;m +1, n) — (0,1, y;m + 1, n)}}. 


{2(y, 1,2;2m + 3, 2n + 1) — (y, 1,z3;m + 1, n) 


{2(0, 1, x; 2m + 3, 2n + 1) 


Using equation (5), we have 


1 
(m + n+ 3) 


— 2(0, 1, x; 2m + 2, 2n + 1)(0, 1, 7; m + 1, n) 
— (0, z;m + 2,n + 1)(0, 2, 1, x; m, n)}. 


(0, 3, 2,1,2;m,n) = {2(0, 1, x; 2m + 3, 2n + 1)(0, 1, 2; m, n) 


Hence 
Pr(@, Sx) = eS {2(0, 1, x; 2m + 3, 2n + 1)(0, 1, x; m, n) 
(8) — 2(0, 1, x; 2m + 2, 2n + 1)(0, 1, x; m + 1, n) 


— (0,2;m + 2,n + 1)(0, 2, 1, x; m, n)}. 


(c) 1 = 4. In order to determine (0, 4, 3, 2, 1, x; m, n) we need the values of 


ROOT OF A DETERMINANTAL EQUATION 53 


(a, 3, 2, 1, b; m, n), (a, 3, b, 2, 1, ¢; m, m) and (a, 3, 2, b, 1, ¢; m, n), which are 
obtained according to the procedure given above. 


Now 


(0, 4, 3, 2,1, 25m, n) = [ 6i(1 — 64)" 010205)” 


0<64<03<02<8,;<z 
- (1 — 6:)"(1 — 62)"(1 — 65)"{1, 2, 3, 4} dO; d0.d0; dO, 


I Oi'(1 — 04)"(0,6204)” 
0<04<83<62<61<2 


- (1 — 6:)"(1 — 62)"(1 — 63)"[616265{1, 2, 3} 
— 60:04, 1,2} + 65040:{3, 4, 1} — 62056,{2, 3, 4}] dO; dO. db, de, 


| 6°(1 — 6)"[(616265)""" 
0<04<63<02<61<z 


- (1 — 6,)"(1 — 62)"(1 — 63)"}{1, 2, 3} 


0< 6; <04<03<02<z iesaans 0<09<09<01 <0 <2 
= T""(y, 3, 2,1, 2;m + 1,n) — Te’"(0, 1, y, 3, 2, b; m + 1, n) 
+ T2""(0, 2,1, y,3,2;m + 1,n) — To'”'"(0, 3, 2, 1, y; m + 1, n) 
= Ts'"'"(y, 3, 2,1,2;m + 1,n) — Ty”'"(0, 3, y, 2, 1, 6; m + 1, n) 
+ Te”"'"(0, 3, 2, y, 1,2;m + 1,n) — To'”'"(0, 3, 2,1, y; m + 1, n). 


Using the results of (a, 3, 2, 1, b; m, n), (a, 3, b, 2, 1, c; m, n) and (a, 3, 2, b, 1, 
c; m,n), we have Pr(6; S x) equal to 


C(4, m, n) 
m+n+4 


{ 
4200, 1, 2; 2m + 5, 2n + 1)(0, 2, 1, x; m, n) 


C(4, m, n)(0, 4, 3, 2,1, 2; m,n) = 


_ 200, 1, 2; 2m + 4, 2n + 1) 
m+n -+ 3) 


— (0, 23m + 2,n + 1), 1, 3m, n) + (m + 2), 2, 1, x; m, n)] 
+ 2(0, 1, x; 2m + 3, 2n + 1)(0, 2, 1, x; m + 1, n) 


[2(0, 1, x; 2m + 2, 2n + 1) 


— (0,2;m + 3,n + 1)(0, 3, 2, 1, x; m, nh. 


(d) 1 = 5. In the evaluation of the distribution of the largest root for 1 = 5; 
the following parts need to be calculated: 


(a, 4, 3, 2, 1, b; m, n), (a, 4, b, 3, 2, 1, c; m, n), (a, 4, 3, b, 2, 1, c; m, n), 
(a, 4, 3, 2, b, 1, c; m, n). 





54 D. N. NANDA 


Proceeding along the lines indicated in the previous sections we get 


C(5, m, n) 
(m + n+ 5) 


_ 200, 1, x; 2m + 6, 2n + 1) 
(m +n + 4) 


— 2(0, 1, x; 2m + 3, 2n + 1)(0, 1, x7;m + 1, n) 
— (0,2;m + 3,n + 1)(0, 2, 1, 2; m, n) 


Pr(, S x) = | 20, 1, x; 2m + 7, 2n + 1)(0, 3, 2, 1, x; m,n) 


{2(0, 1, 7; 2m + 4, 2n + 1)(0,1,2; m,n) 


+ (m + 3)(0, 3, 2, 1, v,m, n)} - wae 


{200 1, x; 2m + 5, 2n + 1)(0, 1, x; m, n) 


— 2(0, 1, 7; 2m + 3, 2n + 1)(0, 1, x; m + 2, n) 


_ 0,z;m+3,n + 1) 


(m +n +3) [2(0, 1, x; 2m + 2, 2n + 1) 
— (0, 23m + 2, n + 1)(0, 1, 2; m,n) 


+ (m + 2)(0, 2, 1, x; m, oy 
— 2(0, 3, 2, 1,2; m + 1, n)(0, 1, x; 2m + 4, 2n + 1) 
— (0,24;m+4,n + 1)(0, 4, 3, 2, 1, x; m, n)| ° 


It is evident now that the above method can be used to derive the distribution 
for any value of l. 


5. Distribution of the smallest root. Let Pr[@, < x/u, v] = P(x/u, v) where 


6, is the largest root. Let us make the following transformations in the R(J, u, v) 
distribution: 


=l1-— #0, 
1 — 64 


1-— A; ; 
then since 0 < 6; < O11 < +++ < @& < 1, we haveO < ™ < rey < Teese < 





ROOT OF A DETERMINANTAL EQUATION 55 


r; < 1, and thus the domain of integration does not change. Hence the joint 
distribution of the r’s can be expressed as 


l 1 
C(l, v, u) I] wr" II (1 — ne Th (75 — 75), O<rn<e+ <n <1. 
-_ ‘= 7 


Thus the r’s have the same distribution as the 6’s, but » and » are inter- 
changed. Therefore 


Pr(@ < x) = Prl — 7 S x) = 1 — Prin S 1-2) 


= 1— P(i — 2/», p). 


Hence, for getting the distribution of the smallest root, we have to change x 
into 1 — x and interchange m, n in the distributions of the largest roots and sub- 
tract the resultant probability from 1. The distributions for the smallest root 
are given below for 1 = 2, 3, 4 and 5. 

(i) 2 = 2. 

Pr(@. < x) = 1 — Pr(@ <1 — x/n, m) 


(11) ~ C(2, n, m) 
mt+tn+2 
— (00,1 —2,n+1,m+1)(0,1,1—2,n, m)}. 


{2(0, 1,1 — 2,2n + 1,2m+1) 


(ii) 2 = 3. 
P,(63 Sx) = 1— {2(0, 1, 1 — x; 2n + 3, 2m + 1) 


(12) -(0, 1, 1 — 2; 7, m) 
— 2(0,1,1 —2;n +1, m)(0,1, 1 — 2; 2n + 2, 2m + 1) 
— 0,1 —a;n+2,m+1)(0,2,1,1 —2z;n, m)}. 
(iii) 2 = 4. 
C(4, n, m) 


{200 1,1 — 2;2n + 5, 2m + 1) 


(0, 2, 1, i- r,7, m) 


2(0,1,1 — x, 2n + 4, 2m + 1) —— 
i Ete nqeietlcnnecadttamsntaeat — 7:2 
( n +3) [2(0, 1, 1 — 2; 2n + 2, 2m + 1) 


— (0,1 —a;n + 2,m+ 1)(0, 1,1 — 237, m) 
+ (n + 2)(0, 2, 1, i- w,7, m)] 
+ 2(0, 1,1 — 2; 2n + 3, 2m + 1)(0, 2,1, 1 —a2;n + 1, m) 


— 0,1 —2;n+ 3,m-+ 1)(0, 3, 2,1,1 —2;n, mh. 











56 D. N. NANDA 


Pr(@; S$ x) =1 C(5, n, m) 


~ (m+n+5) 





20, 1,1 —2;2n + 7, 2m + 1) 


— -(0, 3, 2,1, 1 — x; n, m) 
_ 2(0, 1, 1 — x; 2n + 6, 2m + 1) {2(0, 1, 1 — 2;2n + 4, 2m +1) 


(m +n + 4) aa 
-(0, 1, 1 — x37, m) 
— 2(0, 1, 1 — 2; 2n +3, 2m + 1)(0, 1, 1 — a3 n +1, m) 
— (0,1 —2;n +3, m + 1)(0, 2, 1, 1 — x37, m) 
+ (n + 3)(0, 3, 2,1, 1 — x3, m)} 


2(0, 1,1 — 2;2n + 5,2m +1) f,, — . 
4. baat 4) — = \20, 1,1 — 2; 2n + 5, 2m + 1) 





(14) 
(0, 1, 1 — 2; n, m) 
—{2(0, 1,1 —2;2n +3, 2m + 1)(0, 1,1 — 2;n + 2, m) 
(0,1 —2a;n+3,m+4+1) 
(mn +3) 





[2(0, 1, 1 — x; 2n + 2, 2m + 1) 
— (0,1 —a3n+2,m+1)(0, 1,1 — 237, m) 
+ (n + 2)(0, 2, 1,1 — 2; n, ml — 2(0, 3, 2,1,1 —2;n +1, m) 
-(0, 1, 1 — 2; 2n + 4, 2m + 1) 
— (0,1 —a;n+4,m + 1)(0, 4, 3, 2,1, 1 — a: 7, m) |. 


6. Distribution of any intermediate root. 

(i) ? = 3. 

Pri Sx) = PrO< <b <A <2) + Pr0O< & << mh < ae < HK) 
= C(3, m, n)[(0, 3, 2, 1, 2; m, n) + (0, 3, 2, x, 1; m, n)] 

as the two probabilities are independent, or 

Pr(@, < x) = C(3, m, n)[(0, 3, 2, 1,27; m, n) + (0, 3, 2,7, 1,2;m, n)], where z = 1 
= om) me {20 1,2; 2m + 3, 2n + 1)(0, 1, x; m, n) 
— 2(0, 1,2; m + 1, n)(0, 1, x; 2m + 2, 2n + 1) 
— (0,2; m + 2,n + 1)(0, 2, 1, x; m, n) 

(15) + (x, 1, 2; m, n)[2(0, 1, x; 2m + 3, 2n + 1) 
— (0,2;m + 2,n + 1)(0, 1,2; m + 1, n)] 

(x7,z;m+2,n+1) 

m+tn+2 


[2(0, 1, 7; 2m + 1, 2n + 1) 





— (0,7;m+1,n + 1)(0, 1,2; m, n)] 
+ (x, 1,2; m + 1, n)(0, 1, x; m, n)(O, x; m + 2,n + 1) 


— 2x, 1,2;m + 1, n)(0, 1, x; 2m + 2, 2n + ph, 





n) 


n) 


n) 


2) 


ROOT OF A DETERMINANTAL EQUATION 57 


(ii) 1 = 4. 
Pr(@. = x) = Pr(0 < & < 03; < O < 0 < 2; m, Nn) 
+ PrQ0<& <6 << <2 < &;m,n) 


= C(4, m, n)[(0, 4, 3, 2, 1, 2; m, n) + (0, 4, 3, 2, x, 1; m, n)] 
and 


Pr(6s 


— 


IIA 


x) = Pr(0 < & < 03 < & < 0, < x; m,n) 


+ Pr0<% <6; <0 <2 < 3 m,n) 

+ Pr0<& <0, << 2% < & < 5m, n) 

= C(4, m, n)[(0, 4, 3, 2, 1, x; m, n) + (0, 4, 3, 2, x, 1; m, n) 
+ (0, 4, 3, x, 2, 1; m, n)]. 


The different parts of these probabilities can be evaluated as indicated in sec- 
tion 4(d). Thus the method already indicated to obtain the distribution of the 
largest root also gives the distribution of any one of the roots. 


7. Further problems. It is intended to prepare the probability distribution 
tables for small values of 1. The results obtained in this paper are found to be 
useful in finding the distribution of the sum of the roots when the numbers of 
canonical variates in two sets differ by one. This problem is, however, being 
investigated further. 

Acknowledgements. ‘The author is highly indebted to Dr. P. L. Hsu for sug- 
gesting the problem and for guiding in this research, and is also thankful to Dr. 
Harold Hotelling for his suggestions and help in this work. 


REFERENCES 


{1] P. L. Hsu, ‘On the distribution of roots of certain determinantal equations”, Annals of 
Eugenics, Vol. 3 (1939), pp. 250-258. 

[2] S. N. Roy, “‘The individual sampling distribution of the maximum, the minimum and 
any intermediate of the ‘p’-statistics on the null hypothesis”, Sankhya, Decem- 
ber, 1943. 











A k-SAMPLE SLIPPAGE TEST FOR AN EXTREME POPULATION 


By FREDERICK MOSTELLER 


Harvard University 


1. Summary. A test is proposed for deciding whether one of k populations 
has slipped to the right of the rest, under the null hypothesis that all populations 
are continuous and identical. The procedure is to pick the sample with the larg- 
est observation, and to count the number of observations r in it which exceed all 
observations of all other samples. If all samples are of the same size n, n large, 
the probability of getting r or more such observations, when the null hypothesis 
is true, is about k’”. 

Some remarks are made about kinds of errors in testing hypotheses. 


2. Introduction. The purpose of this paper is to describe a significance test 
connected with a statistical question called by the present author ‘‘the problem 
of the greatest one.”” Suppose there are several continuous populations f(x — ay), 
f(x — as), +--+, f(x — a,), which are identical except for rigid translations or 
slippages. Suppose further that the form of the populations and the values of 
the a; are unknown. Then on the basis of samples from the k populations we 
may wish to test the hypothesis that some population has slipped further to the 
right, say, than any other. In other words, we may ask whether there exists an 
a; > max (a, @2,°** , @i-1, @is1,°*- , a). From the point of view of testing 
hypotheses, the existence of such an a; is taken to be the alternative hypothesis. 
A significance test will depend also on the null hypothesis. We shall take as the 
null hypothesis the assumption that all the a’s are equal: a, = a2 = +--+ = a. 

Using these assumptions it is possible to obtain parameter-free significance 
tests that some population has a larger location parameter (mean, median, quan- 
tile, say) than any of the other populations. 

The problem of the greatest one is of considerable practical importance. 
Among several processes, techniques, or therapies of approximately equal cost, 
we often wish to pick out the best one as measured by some characteristic. 
Furthermore, we often wish to make a test of the significance of one of the 
methods against the others after noticing that on the basis of the sample values, 
a particular method seems to be best. The test provided in this paper allows 
an opportunity for inspection of the data before applying the test of significance. 

The proposed test has the advantage of being rapid and easy to apply. How- 
ever, the test is probably not very powerful, and in the form presented here, the 
test depends on having samples of the same size from each of the several popula- 
tions. The equal-sample restriction is not essential to the technique, but since 
no very useful way of computing the significance levels for the unequal-sample 
case is known to the author, it does not seem worthwhile to give the formulas. 
They are easy to write down. 

58 








ns 
e- 
all 
se, 
sis 


st 


1), 
or 
of 
we 
he 
an 
ng 
is. 


he 


he 
Vs 


W- 
he 
a- 
ce 


le 


A k-SAMPLE SLIPPAGE TEST 59 


3. The test. Suppose we have k samples of size n each. It is desired to 
test the alternative hypothesis that one of the populations, from which the 
samples were drawn, has been rigidly translated to the right relative to the re- 
maining populations. The null hypothesis is that all the populations have the 
same location parameter. 

The test consists in arranging the observations in all the samples from greatest 
to least, and observing for the sample with the largest observation, the number 
of observations r which exceed all the observations in the k — 1 other samples. 
If r > 79 we accept the hypothesis that the population whose sample contains the 
largest observation has slipped to the right of the rest and reject the null hypoth- 
esis that all the populations are identical; instead we accept the hypothesis 
that the sample with the largest observation came from the population with the 
rightmost location parameter. If r < 7, we accept the null hypothesis. 

The statements just made are not quite usual for accepting and rejecting 
hypotheses. Classically one would merely accept or reject the hypothesis that 
the a; are all equal. The statements just made seem preferable for the present 
purpose. 

Example. The following data arranged from least to greatest indicate the 
difference in log reaction times of an individual and a control group to three 
types of words on a word-association test. The differences in log reaction 
times have been multiplied by 100 for convenience. Longer reaction times for 
the individual are positive, shorter ones are negative. Does one type of word 
require a shorter reaction time for the individual relative to the control group 
than any other? 


Concrete Abstract Emotional 
—6 —16 —6 
—6 —11 —5 
—5 —3 —3 
—5 —2 —2 
—4 —2 —1 
—3 —1 0 
—1 —1 1 

0 1 3 
0 1 5 
3 1 12 
a 8 13 
11 10 13 
12 16 15 
29 20 28 


Here we have k = 3 samples of size n = 14 each! We note that the Abstract 
column has the most negative deviation, —16, and that there are two observa- 
tions in that column which are less than all the observations in the other col- 
umns. Consequently r = 2. Under the null hypothesis the probability of ob- 














60 FREDERICK MOSTELLER 


taining 2 or more observations in one column less than all the observations in 
the others is about .33, so the null hypothesis is not rejected. 


4. Derivation of test. Suppose we have k samples of size n, all drawn from 
the same continuous distribution function f(x). Arranging observations within 
samples in order of magnitude the samples O; are: 01 : 21, %2,°--+ , Zin; Oc: 
Tn, Ley °° y Han3 °° 5 Oni Mur, Lina, *°* » Ten- 

If we consider some one sample O;, separately, we can inquire about the 
probability that exactly r of its observations are greater than the greatest ob- 
servation in the other k — 1 samples. 

The total number of arrangements of the kn observations is 


_ (kn)! 
(nt 


The number of ways of getting all n observations of O; to be greater than all 
observations in the remaining samples is 


: [(k — 1)n]! 
(2) N(n) = “ae 





(1) 


The number of ways of getting exactly n — 1 observations of O; greater than 
all observations in the remaining samples is 


[(k —1)n +1]! [(k — In}! 
(3) N(n — 1) = a - ag 


More generally, the number of ways of getting exactly r = n — u of O; to be 
greater than all other observations in the remaining samples is 


[(k—In+u}! [(&—1)n+u-— 1!! 
(n!)F-1 u! (n!)F(u — 1)! 


(4) N(n — u) = 





Therefore the number of ways of getting a run of r = n — u or more observations 
in O; greater than the rest is just 


. (k — 1)n + u}! 
(5) Sm—w= LNW = -_" 


However we do not choose our sample O; at random or preassign it, as the 
demonstration has thus far supposed. Instead we choose that O; which has 
the greatest observation in all the samples. This condition requires us to mul- 
tiply S(n — u) by the factor k. Consequently the probability that the sample 
with the largest observation has r = n — u or more observations which exceed 
all observations in the other k — 1 samples is given by 


(6) PQ) =e 


in 


all 


an 


be 


Ns 


1as 
ul- 
ple 
ed 


A k-SAMPLE SLIPPAGE TEST 61 


As an incidental check we note in passing that 


We note that equation (6) may be rewritten as 
(7) P(r) = kCRS/CR, 
which is a useful form for some computations. 

Table I gives the probability of observing 7 or more observations in the 
sample with the largest observation, among k samples of size n, which are more 


extreme in a preassigned direction than any of the observations in the remaining 
k — 1 samples. 


5. Approximations. If we use Stirling’s formula and approximations for 
(1 + a)’, for small values of a and r, we can write an approximation for equation 
(6) for large values of n with r and k fixed as follows 


1 _ r(2r — 1)(k — 1) 
(8) PQ) ~ go (1- De 9), 
For very large n equation (8) yields 
P 1 
(9) (r)~ jem? 


which is the value given in Table lIforn = «© For many purposes the result 
given by equation (9) is quite adequate, as a glance at Table I will indicate. 


6. Kinds of errors. In tests such as the one being considered here the classical 
two kinds of errors are not quite adequate to describe the situation. 
As usual we may make the errors of 
1) rejecting the null hypothesis when it is true, 
II) accepting the null hypothesis when it is false. 
But there is a third kind of error which is of interest because the present test of 
significance is tied up closely with the idea of making a correct decision about 
which distribution function has slipped furthest to the right. We may make 
the error of 
III) correctly rejecting the null hypothesis for the wrong reason. 
In other words it is possible for the null hypothesis to be false. It is also pos- 
sible to reject the null hypothesis because some sample O; has too many ob- 
servations which are greater than all observations in the other samples. But 
the population from which some other sample say O; is drawn is in fact the right- 
most population. In this case we have committed an error of the third kind. 
When we come to the power of the test under consideration we shall compute 
the probability that we reject the null hypothesis because the rightmost popula- 
tion yields a sample with too many large observations. Thus by the power of 








62 














FREDERICK MOSTELLER 


TABLE I 


Probability of one of k samples of size n each having r or more observations larger 
than those of the other k — 1 samples 





















































.400 |. | 3 |.250 |.036 | 
5 |.444 |.167 |.048 .008 | | 5 |.286 |.066 |.011 |.001 
7 |.462 |.192 oe ee ee 7 |.300 |.079 |.018 |.003 | 
10 |.474 |.211 |.087 |.033 |.011 10 |.310 |.089 |.023 |.005 
15 |.483 |.224 |.100 .042 |.017 15 |.318 |.096 |.027 |.007 
20 |.487 |.231 |.106 |.047 2 20 |.322 |.100 |.030 |.009 
25 |.490 |.235 .110 .050 .022| | 25 |.324 |.102 |.031 |.009 
co |.500 |.250 |.125 |.062 |.031 | | co |.333 |.111 |.037 |.012 
k=4 k=65 
Nor | | i {| | Roe | 
“MN StS #2} 8 li Nt e@iei eta 
eb eee bec has | WON csc cess 
3 |.182 os; | =| | | 8 |.248 }.011 | | 
5 |.211 |.035 |.004 .0003, =| | 5 |.167 |.022 |.0020|.0001 
7 |.222 |.043 |.007 |.0009/.0001) | 7 |.177 |.027 |.0033).0003 
10 |.231 |.049 |.009 |.0015 .0002; | 10 |.184 |.031 |.0046|.0006 
15 |.237 |.053 |.011 |.0022 0004, | 15 |.189 |.034 | .0056| .0008 
20 |.241 |.056 |.012 .0026 0005, | 20 .192 |.035 | .0062) .0010 
25 |.242 |.057 |.013 .0028 0006, | 25 .194 |.036 | .0065) 0011 
x |.250 |.062 |.016 |.0039 .0010, | © |.200 |.040 |.0080).0016 
k = 6 
XT | | | 
| 2 3 | 4 5 | 
ea cer serch mace Soest 
3 |.118 |.007 | } | 
5 |.188 |.015 |.0011 .0000 
7 |.146 |.018 |.0019 .0001 
10 |.152 |.021 |.0026).0003 
15 |.157 |.023 |.0032).0004! 
20 |.160 |.024 |.0035).0005 
25 |.161 |.025 |.0037 .0005) 
«© |.167 |.028 |.0046 .0008) 








Leen te 


A k-SAMPLE SLIPPAGE TEST 63 


this test we shall mean the probability of both correct rejection and correct 
choice of rightmost population, when it exists. 

Errors of the third kind happen in conventional tests of differences of means, 
but they are usually not considered although their existence is probably recog- 
nized. It seems to the author that there may be several reasons for this among 
which are 1) a preoccupation on the part of mathematical statisticians with the 
formal questions of acceptance and rejection of null hypotheses without »e- 
quate consideration of the implications of the error of the third kind for the 
practical experimenter, 2) the rarity with which an error of the third kind arises 
in the usual tests of significance. 

In passing we note further that it is possible in the present problem for both 
the null hypothesis and the alternative hypothesis to be false when k > 2. This 
may happen when there are, say, two identical rightmost populations, and the 
remaining populations are shifted to the left. An examination of Table I will 
give us an idea of what will happen in such a case. If k = 4, we use r = 3 as 
about the .05 level. If two of the populations are slipped very far to the left, 
while the rightmost two populations are identical, in effect k = 2. In this case 
the probability of rejecting the null hypothesis is around .2. Consequently we 
accept the null hypothesis about 80 per cent of the time, and reject it 20 per cent 
of the time under these conditions. But neither hypothesis was true. 

If we carry the discussion to its ultimate conclusion we would need a fourth 
kind of error for these troublesome situations. There are still other kinds of 
errors which will not be considered here. 


7. The power of the test. It is difficult to discuss the power of a non-para- 
metric test, but in the present case it may be worthwhile to give an example or 
two. The reader will understand that although the test is called non-parametric, 
its power does depend on the distribution function. 

In the case of k samples there are two extremes which might be considered for 
any particular form of distribution function. In Case 1, we suppose that 
when the alternative hypothesis is true, k — 1 of the populations are identical 
with distribution function f(z), while the remaining distribution function is 
f(x — a),a > 0. Case 1 may be regarded as a lower bound to the power of the 
test because for any fixed distance a between the location parameters of the 
rightmost population and the next rightmost population, Case 1 gives the least 
chance of detecting the falsity of the null hypothesis. 

In Case 2, we suppose that the rightmost population is f(z — a), a > 0 as 
before, that the next rightmost population is f(x), and that the other k — 2 
populations have slipped so far to the left that they make no contribution to 
problem of the power. This is an optimistic approach to the power because it 
gives an upper bound to the power. When k = 2, Case 1 and Case 2 are identical, 
and the power is exactly the power of the test for the particular distribution func- 
tion under consideration. 

Case 3 which we shall not consider deals with the situation where there is more 





64 FREDERICK MOSTELLER 


than one rightmost population, but the null hypothesis is false. It is connected 
with the fourth kind of error mentioned at the end of section 6. 

Table II gives the upper and lower bound of the power of the test for k = 3, 
r = 3,n = 3, when the distribution is uniform and of length unity. The parame- 
ter a is the distance between the location parameter of the rightmost distribu- 
tion and that of the next rightmost distribution. 

In Table III we give some points on the upper and lower bounds of the power 
of the test for the normal distribution with unit standard deviation. The param- 
eter a is the distance between the mean of the rightmost normal distribution and 
the next rightmost, measured in standard deviations. Again we use the case 
k=3,r=3,n = 3. 


TABLE II 


Power p of the test for the uniform distribution when k = 3, r = 38,n = 3. The 
distance between the midpoints of the two rightmost distributions is a | 


Upper bound pu | | 
Lower bound p; ad sa | 11 








TABLE III 
Power p of the test for the unit normal when.k = 3,r = 3,n = 3. The distance 
between the means of the two rightmost distributions, measured in standard 
deviations, is a 





Upper bound pz 
Lower bound 7p; 


The power of the test has been defined as the probability of correctly rejecting 
the null hypothesis and finding the sample from the rightmost population to be 
the extreme one. This raises a question about the meaning of the entries in 
Tables II and III under a = 0. When a = 0 there is no way to reject the null 
hypothesis correctly. The probabilities given are the probabilities that a 
randomly chosen sample will force a rejection of the null hypothesis. They 
represent the limit of the power function as a tends to zero. If we think of ear- 
marking the sample from the rightmost population and of computing the prob- 
ability repeatedly that that sample will have three observations larger than all the 
observations in the other sample, and then we let a tend to zero, this is the result 
we get. These values are not the significance levels. The significance level is 
.036. 





A k-SAMPLE SLIPPAGE TEST 65 


8. Discussion. The reader may rightly feel that the solution here presented 
to the problem of the greatest one depends on a trick. That is, it depends 
intimately on the choice of the null hypothesis. Furthermore the reader may 
feel that the choice of a, = a2 = --- = a is neither an interesting null hypoth- 
esis nor one which is likely to arise in a practical situation. The author has 
no quarrel with this attitude. This means that there are many other approaches 
to this problem which are worth trying. The equal-location-parameter case is 
one which yields easily to non-parametric methods. 

It will be noted that a useful technique has been indicated which allows one 
to examine the data before making the significance test. In general one may 
wish to set up a test function, decide which of several samples provides the ex- 
treme value of the function, and then test significance given that we have chosen 
that sample which maximizes the function among the k samples under con- 
sideration. 


9. Conclusion. There is a large class of problems grouped around “the prob- 
lem of the greatest one’’. First it would be useful to have a more powerful test 
than the one here proposed. Second, there is the problem of deciding on the 
basis of samples whether we have successfully predicted the order of the location 
parameters of several populations. Third, there is the general problem of what 
alternatives, what null hypotheses, and what test functions to use in treating 
samples from more than two populations. It is to be hoped that more material 
on these problems will appear, because answers to these questions are urgently 
needed in practical problems. 





ON THE UNIQUENESS OF SIMILAR REGIONS 


By Paut G. Hor, 
University of California at Los Angeles 


1. Summary. Conditions are determined for insuring that Neyman’s method 
of constructing similar regions by means of sufficient statistics will yield all such 
regions when such statistics exist. 


2. Introduction. In designing tests of composite hypotheses, one encounters 
the problem of how to construct similar regions and whether the construction 
process yields all possible similar regions. Neyman has derived methods for ob- 
taining similar regions when the basic distribution function satisfies certain par- 
tial differential equations [1] and also when a sufficient set of statistics exists for 
the unknown parameters [2]. In the former case, the construction process gave 
all such regions; however the question of whether certain subregions were inde- 
pendent of the parameters was left unanswered. In the latter case, the indepen- 
dence was obvious, but the question of uniqueness was not considered. In 
obtaining sufficient conditions for the existence of a type B region, Scheffé [3] 
employed Neyman’s differential equations assumptions and methods and demon- 
strated that the subregions were independent of the parameters. 

The method of constructing similar regions by means of sufficient statistics is 
much simpler to demonstrate than is the method based on differential equations. 
It also has the advantage that the independence of the subregions requires no 
proof. It possesses the disadvantage that the question of uniqueness is not 
answered. This question can be answered by showing that the assumption of a 
sufficient set of statistics includes the differential equations assumption and then 
employing methods based on the latter assumption. Such a procedure would 
deprive the sufficiency method bdf its simplicity; consequently a relatively simple 
direct proof of uniqueness has been constructed. The method of proof also shows 
the equivalence of the two methods of constructing similar regions. 


3. Sufficient conditions for uniqueness. Consider a distribution function, 
f(x|@:,--- , 0,), of the variable z that depends upon the v parameters 6;, --- , 
6,. Let 21, 2%2,-°-- , 2%, denote a random sample from this distribution and let 
f(ai, +++, %n\01,--- , 6,) denote the distribution function of such a sample. It 
will be assumed that n > »v. 

Suppose there exists a sufficient set of statistics Ti(a1, +--+, Yn), °°", 
T,(x1 , --* , Xn) with respect to the parameters @,,-++, 6,. Koopman [4] has 
shown that if the 7’s are continuous and if f(x/@,, «++, 6,) is analytic, then 
f(x|@: , +++ , 6,) must be a function of the form 


(1) Mal ti5 +4956) = exp| OX. +0 +X] 
66 















UNIQUENESS OF SIMILAR REGIONS 67 





where the ©; and 6 are single-valued analytic functions of the 6’s only, and the 
X. and X are single-valued analytic functions of x only. He has also shown that 
if » assumes its smallest possible value, then 


(2) x Xx(2s) = Vi(T1,---, T,), 


where the V’s are single-valued functions of the T’s. If the preceding conditions 
are satisfied, it follows from (1) and (2) that 


63) f(x, oe » Tn| 41, 72+, Oy) = exp | Sosvs + no + & xix | 


Now it is known [2] that if the T’s possess continuous partial derivatives and 
are such that it is possible to introduce additional functions 7,4; , --- , 7’, which 
will make the transformation 


T,.= T(t, imal » &) 
(4) 


Tn = Tr(ti,°** y Ln) 
one-to-one, then f(z1, «++ , Zn, «++ , 6,) can be written in the form 
Ce | ee 
=fi(T1,°*°, T.\6: , +++, O)fe(ai, + » tlt. ---,T7,), 


where f; is the distribution function of the T’s and f2 is the conditional distribution 
function of the x’s for fixed values of the T’s. The function f2 does not depend 
upon any of the parameters 4,,--- , 4. 

For the purpose of constructing similar regions, it is desirable to work with f, . 
By combining (3) and (5), f: may be expressed in the form 


(5) 


ue 
(6) fi(T1, +++, Ty \1,°**, 6) = exp | 0.Vi + nO + | 
- 


where H = YX(x;) — log fe can be expressed as a function of 7,, --- , 7, only, 
and where it is assumed that f. > 0. 

The method employed by Neyman to obtain a similar region of size a is to 
build it up as the locus of subregions of size a on the “surfaces” obtained by giving 
the T’s constant values. Since the size of such a subregion is obtained by inte- 
grating f. over the subregion, it will depend only upon the 7’s; consequently a 
subregion can be selected that will be of size a for every set of values of the 7’s. 

Now consider the construction of a similar region of size a by building up the 
region as the locus of subregions of varying size rather than of constant size on 
the surfaces that are obtained by giving the T’s constant values. Let w; and we 
be two regions of size a and let a:(71, --- , T,) and as(T1, «++ , T,) denote the 











te 


SS 


68 PAUL G. HOEL 


sizes of the surface subregions. It will be assumed that the regions under con- 
sideration are such that a; and a2 are obtainable from integrating fz over the sub- 
region common to w; and we respectively and the surface determined by fixing 
the values of the T’s. The problem then is to determine whether two different 
functions, a; and az, can yield similar regions of size a. 

Since a critical region can be obtained as the locus of subregions, a; and az will 
yield similar regions of size a only if 


(7) [- [oes(Diy ee, TAT ay +5 To] O15 +225 8) AT, ++ aT, = a 
é 


(j = 1,2), 


where the integration extends over the range of values of the T’s. By means of 
(6), condition (7) may be written as 


(8) / cee [ esexp b 06.Vi+ 70+ a | dT,--- dT, =a (j = 1,2). 
I 


If e”’ is factored out, it is clear that condition (8) will hold only if 
yu 
| “=< [a exp | = 0, V; a a] dT, eee aT, 
. 1 
(9) y 
= | nee [ee] X O. Vi + | aT, wes aT, 
1 


is an identity in the 6’s, and hence in the 6; for the region in the 6; space that 
corresponds to the region in the parameter space for which the parameters 6, , 
-++ , 6, are defined. 

Now assume that » = » and that the transformation 


Vi = VA(T1, +--+ , T,) 
(10) 


V, ws VT; grees T,) 


is one-to-one. From the preceding assumptions that gave rise to (2) and (4), it 
may be shown that the V’s are continuous and possess continuous partial deriva- 
tives. In terms of the V’s, (9) may therefore be written as 


/ oa [exp | aA Ki d¥, --+ &. 
1 
= | ewe [exp [= nV. | Ke dV, oe aV,, 
1 
where K; = aie” has been expressed in terms of the V’s. 


Since the parameters will be defined over intervals and 6; is an analytic func- 
tion of those parameters, to every region in the parameter space determined by 


(11) 









it 
A- 












UNIQUENESS OF SIMILAR REGIONS 69 





intervals of the 6’s there will correspond an interval for 6; throughout which ©; 
will be defined; consequently (11) will be an identity in the ©, for intervals of 
values. For every point within regions determined by ©; intervals, the partial 
derivatives of the two sides of (11) must therefore be equal, provided the deriva- 
tives exist and provided the 6, are functionally independent. 

If the conditions to be imposed shortly are satisfied, it can easily be shown that 
it is permissible to differentiate (11) repeatedly under the integral signs with re- 
spect to the @,. As a consequence, (11) implies that for all sets of non-negative 
integers ki, --- ,k,, 


[-- | yin...yh exp | > LV: K,| dV, +--+ dV, 
1 
- v 
= | pease | vi — Vir exp | OrVs Ks | dVa-+- dV, 
1 


will be an identity in the 6; for almost all values of the 6,. But (12) is equivalent 
to requiring that 


(12) 


| Vit. Vi" o(Vi, --+, V,) dV, --- dV, 


(13) 


= | se [Vie Via (Wi, 2, Vi) da dV, 


shall hold for all sets of non-negative integers ki, --- , k,, where g; and ge are 
the integrands of (11) after they have been divided by the function of the ©; ob- 
tained from integrating (11). Since gi and ge will then be non-negative functions 
of the V’s whose integrals over all values of the V’s is one, they are distribution 
functions of the V’s. If g: and ge possess moments of all orders and are such that 
they are uniquely determined by their moments, then condition (13) implies that 


(14) n(Vi yee s V,) = g2(Vi nes V,). 


This identity will hold for almost all values of the parameters. If the conditions 
necessary to justify (14) are satisfied, it therefore follows that 


a(T,,+*° ’ T,) —_ a(T;, es Tas 


and that Neyman’s method of constructing similar regions by choosing 
a(T,, ---, T,) = @ yields all possible similar regions of the class of regions 
being considered. 

The conditions that were imposed on f(z|@:,---, 6,) in order to establish 
uniqueness may be summarized as follows: The distribution function 
f(x|6, , --+ , 0,) is analytic and possesses a set of sufficient statistics, T:,--- ,7,, 
with respect to the parameters 6; , --- , 6, , that are continuous and possess con- 
tinuous partial derivatives. There exist one-to-one transformations of the types 
(4) and (10). The function ce?%"*t” , treated as a distribution function of the 
V’s, possesses moments of all orders and is uniquely determined by its moments. 





70 PAUL G. HOEL 


Finally, the ©, are functionally independent with the smallest possible value of 
» equal to ». 

If the assumption that the ©; are independent is not realized, the distribution 
function (1) could be expressed in terms of fewer than » parameters. This is 
also true if u < v. The two assumptions that 1 = » and that the ©; are indepen- 
dent will therefore be satisfied if (1) is expressed in terms of the minimum number 
of parameters. The remaining assumptions can often be checked quite easily 
whenever a particular distribution function is given. 

In deriving tests of hypotheses for certain parameters, the distribution function 
f(x|@: , --- , 6) will of course contain those parameters in addition to the param- 
eters 6,,--- , 6,, but since they will have fixed values, it was not necessary to 
introduce them into the discussion. 


4. Equivalence of methods. Although the equivalence of the two methods 
of constructing similar regions has been implied in the literature [1], no simple 
demonstration seems to be available. Such a demonstration is easily given by 
means of (3). Let 


0 log f 
Gi 30; > 
where f is given by (3) with » = », and let 


7 0g; 
" 06; 
Differentiation of (3) yields 


a 
a= . a Vetn 
(15) ; 
. Oy aoe 
$7 = V iam © 
ei = 2 oe an, 00:00, * +” 36,86, 

The differential equations that are assumed to hold in the other method of con- 
struction [1] may be written in the form 


_ (16) yy = Ag+ 2» Barge, (i, = 1,°**,»)5 


where the A;; and B;;, are functions of the 6’s only. Upon substituting the 
values given by (15), it will be found that (16) will be satisfied if 


" dO, a 
a8) 80; 00; -> Be 5 ae 
and 


Ay + Bi 


" 50,00; a 





UNIQUENESS OF SIMILAR REGIONS 71 


Since (17) represents a set of v equations in the B;;’s, whose coefficient matrix is 
non-singular because of the functional independence of the ©, , it follows that 
sets of A’s and B’s can be found to satisfy equations (16). This shows that the 
sufficiency assumption includes the differential equations assumption. 

Now the method of constructing similar regions here consists in building them 
up as the locus of subregions of size a on the surfaces obtained by giving the ¢; 
constant values. But from (15) it follows that the surface ¢; = c;(i = 1, --- , ») 
is equivalent to the surface 

> 80; -, a0 


ag Vet nga, ORL yy) 


which may be written in the form 


(18) LS Vac, (i = 1,+--,»), 
1 00; 

because @ is a function of the parameters only. Since the coefficient matrix of the 
V’s in (18) is nonsingular, (18) may be solved for the V’s; consequently the sur- 
face gy; = c;, (¢ = 1, --- , v) is equivalent to the surface V; = c; , (i = 1,---,»). 
But from the assumption concerning the transformation (10), the surface 

-+ , v) is equivalent to the surface 7; = c;’, (i= 1,---, »). 
Thus, the two surfaces ¢; = ¢; (¢ = 1,---,») and T; = ce’ , (@@=1,--- ,v) are 
equivalent and hence the two methods of constructing similar regions are 
equivalent. 


REFERENCES 

{i] J. Nerman, “On a statistical problem arising in routine analysis and in sampling 
inspections of mass production,’’ Annals of Math. Stat., Vol. 12 (1941), pp. 46-76. 

(2) J. NeyMan, ‘Outline of a theory of statistical estimation based on the classical theory 
of probability ,’’ Roy. Soc. Phil. Trans., Vol. 236A (1937), p. 364. 

[3] H. Scurrré, ‘On the theory of testing composite hypotheses with one constraint,’’ 
Annals of Math. Stat., Vol. 13 (1942), pp. 280-293. 

[4] B. O. Koopman, “On distributions admitting a sufficient statistic,’? Trans. Amer. Math. 
Soc., Vol. 39 (1936), pp. 399-409. 





NOTES 
This section is devoted to brief research and expository articles and other short items, 
(SO Se Rc te 


CONVERGENCE OF DISTRIBUTIONS 


By HERBERT ROBBINS 
University of North Carolina 


Let fa(z) (n = 0, 1, 2, +++) be frequency functions 


(1) fle) >0, [fale dx = 1. 


There are various ways in which the sequence of distributions corresponding to 
the f,(a) (n = 1, 2, ---) may be said to converge to the distribution correspond- 
ing to fo(x). The definition customarily adopted in mathematical statistics 
(see e.g. [1]) is equivalent to the condition 


é E 
(a) lim fr(x) dx = [ fo(x) dx for every é.’ 


no 


We shall also consider the two further conditions 


(b) lim [ fn(x) dx = [ fo(x) dx for every Borel set S, 
nwo 8S 8 


and 


(c) lim | tea) dx = [ sae) dx uniformly for all Borel sets S. 
n—co “8 

It is clear that (c) implies (b) and that (b) implies (a). That the converse 
implications do not hold is shown by the following examples. 

ExamP_e 1. Let fo(x) = 1 for 0 < x < 1 and 0 elsewhere. Choose and fix 
any 0 < e < 1, set 6, = ¢/n-2", and for n = 1, 2,--- let fa(z) = 1/n-6, for 
i/n — bn < & <i/n (¢ = 1, 2,---,m) and O elsewhere. If we denote by S, 
the set of all x for which f,(«) > 0 it is easy to see that for n = 1, 2, --- 


g g 
(2) 0< [ fo(a) dx — [ fr(x) dx < 1/n for every &, 


(3) [ flax) dx = /2", [ fr(x) dx = 1. 


1 From a well kown theorem of Pélya the convergence is then necessarily uniform for all . 
72 





CONVERGENCE OF DISTRIBUTIONS 


Hence for the Borel set S = >> S, it follows that 
1 
(4) [ fe) ax <  f focz) de = 6 
Sp 


(5) [ 1.00) dz = [ fn(x) dx = 1, (n = 1, 2, see), 


From (2) we see that (a) holds (uniformly for all £), and from (4) and (5) that 
(b) fails about as badly as possible. 

This construction can be modified to apply to any fo(x); thus choosing fo(x) = 
(Qne") 1? we can construct f,(z) (n = 1,2,---) and a Borel set S such that 


t 1 ft 
lim f.(z) dz = Van [ el? dy uniformly for all &, 


1 : 
Via | e*? dr = 01, [ t.@) dx = 1, (n = 1, 2, +++). 
Ss 8 


It is conceivable that some time a statistician, failing to consider such a possibil- 
ity, will be led to approximate .01 by 1. 

If X, is a random variable with frequency function f,(x), if y = g(x) is a Borel 
function, and if (a) holds, then it follows from Example 1 that the distribution 
function H,(y) of Y, = g(X,,), equal to the integral of f,(x) over the set S, of all 
x such that g(x) < y, need not converge to the distribution function H,(y) of 
Y, = g(X.). It is easily seen that this possibility is excluded if, as commonly 
occurs in applications, g(x) is such that for every y, the intersection of S, with 
any finite interval is the sum of a finite number of intervals (e.g., if g(x) = sin x). 

EXAMPLE 2. Let fo(x) be defined as in the previous example, and for n = 
1, 2,--- let f.(z) = 1 + sin (2xnz) for 0 < x < 1 and O elsewhere. By the 
Riemann-Lebesgue theorem it follows that (b) holds. But let S, denote the 
set of all x for which f,(x) > 1; then 


[ fie ar = 4, [la) dz = 4 + 1/r, (n= 1,2,---), 


so that (ce) does not hold. 

It follows from these examples that (a), (b), and (c) are successively stronger 
definitions of convergence. We shall now give some definitions equivalent to 
(b) and (e). 

First we recall that the non-negative, completely additive, and absolutely con- 
tinuous set functions 


6) PS) = f fala) ar, 











74 HERBERT ROBBINS 

are said to be uniformly absolutely continuous if for every « > 0 there exists a 
5 > 0 such that for any S and any n = 1, 2,---, 

(7) m(S) < 6 implies P,(S) < e. 


We shall denote the condition that the P,(S) be uniformly absolutely continuous. 
by (u.a.c.), and we shall now prove that (b) is equivalent to 


(b’) (a) and (u.a.c.). 


Proor. (A) Suppose (b) holds. It is clear that (a) holds, and we shall show 
by contradiction that (u.a.c.) holds also. For if not then there would exist an 
e > O such that for any 7 > O we could find a set S and an integer n such that 


(8) m(S) < n, P,(S) > e. 


Moreover, since the set function 
P(S) — [ te) dx 
8 
is absolutely continuous, there exists a 6 > O such that 
(9) m(S) <6 implies Po(S) < ¢/2. 


Now by (8) there exists an S; with m(S,) < 6/2 and a k; such that P;,(Si) > e. 
Next, there exists an S2 with m(S.) < 6/2” and a ke such that P;,(S2) > ¢, and 
it is easy to see that we may assume that ke > ki. Proceeding in this way we 


find a sequence of integers ki < ke < --- and of sets S;, S2,--- such that 
(10) m(Sn) < 6/2", Pi,(Sn) > €, (n = 1, 2,-->). 
Let S = 2 S, ; then by (10), m(S) < >-1m(Sp) < 6, so that by (9), 

(11) PAS) < €/2. 

But by (10), 

(12) P,,(S) > Pi(Sa) > «, (n = 1, 2,---). 


From (11) and (12) we conclude that (b) does not hold, which is a contradiction. 
Hence (b) implies (b’). 

(B) Suppose (b’) holds. We shall show first that (b) holds for any set S; 
of finite measure. Choose any e > 0; then from (u.a.c.) it follows that there 
exists a 6 > 0 such that 


(13) m(S) < 6 implies P,(S) < ¢/8 (n = 0, 1,2, ---). 


It is known from the theory of measure that corresponding to S, and to 6 we can 
find a set S: which is the sum of a finite number of disjoint intervals, such that 


(14) m((S: — Se) + (Se — S;)) < 6. 


CONVERGENCE OF DISTRIBUTIONS 


From (13), (14), and the relations 
(15) Pna(Si) = P(S2) + Px(Si — S2) — Pa(Se — Si), (n = 0, 1,2, ---), 
it follows that 
| Po(Si) — Pa(Si) | < | Po(S2) — Pa(S2) | + Pa(Si — Se) + Pa(Se — Si) 
+ Po(Si — Se) + Po(Se — Si) < | Po(S2) — Pa(S2) | + €/2, 
and from (a) that for large enough n, 
(17) | Po(S2) — Pa(Se) | < €/2. 
Thus from (16) and (17) it follows that for large enough n, 
| Po(Si) — P.(S:) | < «, 


which proves (b) for the case m(S) < o. 
Now given any e > 0 choose a, 8 so that, setting A = {a < x < B}, we have 


(19) P(A) > 1 — é/4. 
Then it follows from (a) that for large enough n, 
(20) P,(A) > 1 — ¢/2. 
Then for any Borel set S we have for large enough n, 
P,(S) — Po(S) = P.(SA) + PQS — A) — P(SA) — P(S — A), 
| Pn(S) — Po(S) | < | Pa(SA) — Po(SA) | + PaCS — A) + Po(S — A) 
< | Pa(SA) — Po(SA) | + €/2 + €/4. 


But by the previous case, since m(SA) < o, for large enough n we shall have 
| P,(SA) — Po(SA) | < €/4. Hence for large enough n, 


| P,(S) — Po(S) | < «, 


so that (b) holds in this case also. This completes the proof. 
We shall say that lim f,(2) = fo(x) in measure if for every « > O and for 


(16) 


every set A such that m(A) < ©, the measure of the set of all x in A for which 
| fn(x) — fo(x) | > €, tends to 0 as n increases. (For a space of finite measure 
this reduces to the usual definition.) We now observe that (c) is equivalent to 


(c’) lim fa(x) = fo(x) in measure. 


In fact, it is easy to show that (c) is equivalent to convergence in the mean of 
order 1, 


(e” tim [| fala) — fol) | dx = 0, 





76 Z. W. BIRNBAUM 


which implies (c’), and a theorem of Scheffé [2] states that (c’) implies (c).’ 
Finally, it is not hard to show that the condition 
(d) lim f,(x) = fo(x) almost everywhere 
implies (c’) but not conversely. 

Summing up, we arrive at the following complete set of implication relations 
among the various modes of convergence which we have considered: 


(20) (d) > (ce) & (e’) & (ce) — (b’) & (b) — (a). 


REFERENCES 


[1] H. Cramtr, Mathematical Methods of Statistics, Princeton Univ. Press. 1946, pp. 58-60. 

[2] H. Scuerr®, “A useful convergence theorem for probability distributions,” Annals of 
Math. Stat., Vol. 18 (1947), pp. 434-438. 

3] E. J. McSuane, Integration. Princeton Univ. Press, 1944, p. 168. 


rE 


ON RANDOM VARIABLES WITH COMPARABLE PEAKEDNESS 


By Z. W. Brrnspaum 


University of Washington 


The quality of a distribution usually referred to as its peakedness has often 
been measured by the fourth moment of the distribution. It is known, however, 
that there is no definite connection between the value of the fourth moment and 
what one may intuitively consider as the amount of peakedness of a distribution.’ 
In the present paper a definition of relative peakedness is proposed and it is shown 
that this concept has properties which may make it practically applicable. 

DEFINITION. Let Y and Z be real random variables and Y,; and Z, real con- 
stants. We shall say that Y is more peaked about Y, than Z about Z, if the in- 
equality 


PUY -Yi|2=7) s$P\Z-Alz=7) 


is true for all T = 0. 

If, for example, Y and Z are normal random variables with expectations Y; 
and Z, and standard deviations o, and o; , and if o, < o,, then Y is more peaked 
about Y; than Z about Z,. Similarly, if Y is a random variable such that 
P(Y <a) = P(Y > b) = Ofora < b, and if Z is the discrete random variable 
with P(Z = a) = P(Z = b) = }, then Y is more peaked about 3(a + 6) than 
Z about the same point. 


2Scheffé actually proves that (d) implies (c), but the Lebesgue convergence theorem on 
which his proof is based holds for convergence in measure (see e.g. [3]). 

1], Kaplansky, ‘“‘A common error concerning kurtosis,’ Am. Stat. Assn. Jour., Vol. 40 
(1945), p. 259. 





ON RANDOM VARIABLES WITH COMPARABLE PEAKEDNESS 77 


Lemma. Let Y,, Y2,Z1, Ze be continuous random variables’ with the probability 
densities gil Y3), gol Y. 2), fi(Z), fo(Z) such that 
. Y; and Y2 are independent, Z, and Z. are independent, 
. g(Y) = oi — Y;) for all Y;,f(Z:3) = f— Z, for all Z; , (¢ = 1, 2), 
. go V2) and f,(Z1) are not-increasing functions for positive values of the vari- 
ables, and 
. Y; 7s more peaked about 0 than Z; , fori = 1, 2. 
Let Y = ¥1 + YeandZ=2Z,+ Z.. Under these assumptions Y 1s more peaked 
about 0 than Z. 
Proor: Let &(y) = P(Y; S y), Fi(z) = P(Z; S 2), fori = 1, 2, be the cumula- 
tive probability functions. For any random variables Y; , Y2 , Z: , Ze (not neces- 
sarily continuous) which fulfil assumption 1° we have, for any 7, the relation- 
ships 


PY $7) - PAs 1) =f (eT — sable) — FUT — dF Qo)) 
- [. [6,(7 — s) — F,(T — s)]d&,(s) 
& [ . F\(T — s){d@.(s) — dF,(s)] 
‘ [ ; [a(T — s) — F(T — s)|db,(s) 
a [ : [6.(s) — F,(s)\dF\(T — s) 
- [ [6,(7 — s) — F\(T — s)]d®,(s) 


4 [ [6.(7 — s) — F(T — s)]dF,(s) 
= 1,(T) + 1.(T), 


where I(T) = [ [6(7 — s) — F(T — s)|d®,(s) 


‘ [ [6,(—s) — F,(—s)]d@,(T + s) 


“hq 


= [ {[F4(s) —_ ®,(s) |d&.(T om s) 
+ [#(—s) — Fi(—s)]d®(T + s)}, 


ete. 


2 As defined e.g. in H. Cramér, Mathematical Methods of Statistics, Princeton University 
Press, 1946, p. 169. 





78 Z. W. BIRNBAUM 


If the random variables have distributions symmetrical about zero (assumption 
2°) this is equal to 


+00 
I ([P(Z1 <8) — P(Y; < )\dP(¥2 S$ T— 8) 


+ [P(Y; < -s) — P(Z, < —s)\dP(Y2 < T+8)} 


s T — 8) 


+00 
= I {{1 — P(Z; > s) —1+ P(Y¥1 > s)]dP(Y2 


+ [P(Yi 2 s) — P(Z, 2 s)\dP(Y2 ST + 8)} 


"wa. =) — PG z oP, s T+) + PW ST 9] 


— [P(Y, = s) — P(Z, = s)|dP(Y2 = T — s)}? 


and we obtain 


I(T) = r [P(Y, = s) — P(Z, = s)|d[P(Y2 = T + 8) 
(1.1) 7 
+P02sT-9)-[ PO =9) — P= oP s 


By an analogous argument one derives the equality 


I(T) = r [P(Y2 = s) — P(Z, 2 s)|dP(Z, S$ T + 8) 
(1.2) 


+ P(Z, 3 T — 8)) - [i [P(Y2 = s) — P(Z; = s)|dP(Z, S T — 8). 


Making use of the assumption that Yi, Y2, Z: , Z2 , are continuous random vari- 
ables, we conclude that the second integrals in (1.1) and (1.2) are zero, and we 
may write 


00 
ga) n(n) =[) PM29 - Pa \le(T + 8) — oT — s)]ds, 


+00 
(2.2) 12(T) -{ [P(Y2 2 s) — P(Z2 2 s)lA(T + s) — fi(T — 8)]ds. 


For T > 0 we have, making use of assumption 3°, 

g(T + s)—¢(T —s) SOFOSsST 

g(T +s) —¢(T —s) = ols + T) —¢l(s —T) SOFOS 
and similarly 
fi(T +s) —fi(T — s) S Oforall T = Oands 2 0. 

Since according to assumption 4° we also have 

P(Y, 2s) — P(Z, 2s) $0 

P(Y2 = s) — P(Z, 2 s) = Ofors 2 0, 
















ON RANDOM VARIABLES WITH COMPARABLE PEAKEDNESS 79 


both integrands in (2.1) and (2.2) are non-negative for all values of s, and we 
conclude 

P(Y ST) - P(Z ST) =1,(T) + 1(T) 20, 
and hence 


(3.1) 








P(Y 2 T) —P(Z2=T) S OforT = 0. 


From assumption 2° one easily sees that Y and Z have symmetrical probability 
distributions. This together with (3.1) leads to 


P(Y 27) —- PZ 2 T) = PY = — T) — PZ Ss -— T) $0, 










and thus to 





P(\Y|=T) —P(\Z| = T) $< OforT = 0. 


As can be seen from (1.1) and (1.2), the assumptions of the Lemma, in par- 
ticular the assumption that all variables are continuous and the assumption 3’, 
are rather special sufficient conditions for Y being more peaked about 0 than Z. 

THEOREM 1. Let Y and Z be continuous random variables with probability 
densities g(Y) and f(Z) such that 

1°. o(— Y) = ¢(Y) for all Y, f(— Z) = f(Z) for all Z, 

2°. oY) and f(Z) are not-increasing functions for positive values of the variables, 

3°. Y is more peaked about 0 than Z. 

Let Y1, Y2,°--, Ynand Z,, Zz, --- , Z, be random samples of Y and Z, respec- 


tively, and Y, = = Y;,2,. = = po Z; —Then Y, is more peaked about 0 than 
j=l j=1 
Zn 

















Proor. From the preceding Lemma one concludes by simple induction that 
Y’=¥,+ Ye+--- + Yras wellas Z’ = Z7, + Z.+ --- + Z, are continuous 
random variables with distributions symmetrical about zero and probability 
densities not-increasing for positive values of the variables, such that Y’ is more 
peaked about 0 than Z’. From this the theorem follows immediately. 

The conjecture that assumption 2° of Theorem 1 might be superfluous is in- 
correct as may be seen from the following example: 

Let Y be any continuous random variable with a distribution symmetrical 
about zero and such that P(| Y| > a) = Oforsomea > 0. Let Z be the dis- 
crete random variable with P(Z = — a) = P(Z = a) = 3. Wehavefor0 < 
7 a4 @ 


Ww 





P(|Y|27) s$1=P(\Z|2 7), 


hence Y is more peaked about 0 than Z. If Yi; , Y2 and Z; , Z2 are random sam- 
ples of size 2, we have 


P(Z, = —a) = P(Zz=a) =}, P(Z=0) =}, 
and thus 






P(\Z.| = T) = }3for0 < T Sa. 


80 Z. W. BIRNBAUM 


The random variable Y2 is continuous, with a distribution symmetrical about 

zero, such that P(| Ye | = a) = 1. There exists, therefore, a 7; such that 

0 < 7; S aand that P(| Y2| 2 Ti) = 2. It follows that 
> 


P(| Yo MT) =2>3 = P(|Z| 2 7), 


hence Y2 is not more peaked about zero than Z.. The random variable Z is 
discrete, but it can be approximated by a continuous random variable with a 
U-shaped probability density, so that all the probabilities will be modified only 
very slightly and Y% still will not be more peaked than Z,. Nothing will change 
in this example if one assumes that Y fulfils condition 2° of Theorem 1. 

THEOREM 2. Let Y be a continuous random variable such that 

1°. g(— Y) = oY) for all Y, 

2°. o(Y) is a not-increasing function for Y > 0, 

3°. P(| Y| > a) = O for some a > 0. 
Let Y;, Y2,-++, Yn be a random sample of size n and Y, = >> Y;. Then, 

y 


for any y = O, we have 


(4.1) P(|¥n| = y) S Vn (*). 


a 


where 


2 e{n\| n , 


Proor. Let Z be the random variable with uniform distribution in the 
interval —- 15 Z2 21. If Z,, Z2,---, Zn is a random sample, then Z’ = 
Zi + Ze +--+: + Z, has the cumulative probability function® 


3; <4 —a, 


w(t - i), ses 


P(Z, < 


3 This expression is due to Laplace. For derivation and discussion, see: J. V. Uspensky, 
Introduction to Mathematical Probability, McGraw-Hill, 1937, p. 279, and Cramér, op. cit., 
p. 245. 





RANDOM NUMBERS 


Thus, 


P(|Z,| = t) = 21 — P(Z, s 0) 


~ _i _yi(nr\| 2 ae 
7 21 Dict" ) (")[3 e iT}, 


and in view of the identity 


> (—1)* (7) (u — k)” = n! 


this becomes 


P(\Z,| 2) = = ae (—1)* [3 (¢+1)—- | = W,(t) 


for 0 S$ ¢ <1. The random variable = is obviously more peaked about zero 


than Z. Since and Z fulfil the assumptions of Theorem 1, it follows that 


= is more peaked about zero than Z,, , that is 


p(|2 zt) s P(\Z,| 2) = Walt) for t20. 


Setting at = y, one obtains (4.1). 

For n — o the function W,(¢) approaches asymptotically the probability 
P(| X | = t\/3n) for the normalized normal random variable X.“ For n = 8 
one obtains the following values which indicate a good approximation:! 


t 3998 5254 .6711 
P(| X| = thV/24) ~~ 05 01 001 
W(t) .049 .0092 .0005. 
For smaller values of n, W,(t) can be easily computed. 


(RR ne 


A METHOD FOR OBTAINING RANDOM NUMBERS 


By H. Burke Horton 


Interstate Commerce Commission 


The need for large quantities of random numbers to be used in sample design, 
subsampling, and other statistical problems is well known. Tippett’s [1] num- 
bers have been widely used for these purposes, despite criticism directed at 
their lack of randomness. The following procedure may be of interest to those 


* Cramér, op. cit., p. 245. 








82 H. BURKE HORTON 





who wish to develop their own random series. The method described below will 
ultimately be used to record extensive tables of random numbers for general use. 

Current methods of producing random numbers usually depend upon single 
operations of mechanical or electronic devices. These may be described as 
“single-stage”’ random number processes. The numerical results are biased to 
the same extent as the devices from which they are taken. 

At this point it is desirable to describe a process which may be called “‘com- 
pound” randomization. Assume two roulette wheels arranged in series so that 
the first controls the arrangement of symbols on the second wheel, while a turn 
of the second wheel determines which of its positions is to be observed. If the 
decimal system is used, the first wheel would have 10! ‘‘equally likely”’ positions, 
and the second would have 10 “equally likely” positions. If three such wheels 
were to be chained, the first would require (10!)! positions, the second 10! posi- 
tions, and the third 10 positions. In general, if n wheels were to be chained, 
the first would require 10(!)""* “equally likely” positions. It is not practical 
to design such a machine.’ 

One method of surmounting these difficulties is to shift to the binary system 
in order to take advantage of the fact that 2! = 2; or, in general, 2(!)" = 2. 
This property makes feasible the chaining of any number of machines in series; 
and, furthermore, the machines can be of the same design. If desired, the re- 
sults taken from a single machine may be chained. Another important feature 
is the ease of handling binary chains by electronic systems. 

The words “equally likely” have been placed in quotation marks thus far to 
indicate that the probabilities are as nearly equal as manufacturing precision 
permits. Any simple single-stage device will have some bias, and it is this very 
lack of true equality that the chaining process is designed to meet. For con- 
venience we may take as our binary symbols +1 and —1 rather than the custom- 
ary 1 and 0. We adhere to the usual rules regarding the sign of a product. 

Let p; be the probability of obtaining +1 in the i” trial (or in the i” machine 
of a chain of machines). 0 < pj < 1. gq: = 1 — p; represents the probability 
of obtaining —1 in the 7 trial. 

Let P; be the probability of obtaining +1 as the product of 7 trials. Q; = 
1 — P;is the probability of obtaining —1 as the product of 7 trials. The follow- 
ing relationships can be set down immediately: 



















Py =p O=H 
P, = Py-po + Qi-Ge Qo = Pi-q2 + Qi-pr 
P; = P2-p3 + Qe-qs Qs = Pe-ds + Qe-Ds 












Pi = Pipi + W414 Qi = Pings + Q--pi 


1 It has been pointed out by Dr. George W. Brown that a practical solution is possible 
using*’any number base, n, by addition of random digits (0, 1, 2, --- n — 1) modulo n. 























RANDOM NUMBERS 


We may calculate the bias, ?; — 3, for a chain of k trials: 
Pr — 3 = 2(Px — Q) 
= 2(Pra pe + Qea-de — Pra-Qe -- Qs De) 
Factoring, we have 
Py — ¥ = 3(Pea — Qea)(pe — &) 
Substituting for Pi: — Q,-1 and factoring again, 
Pr — 3 = 3(Pi-2 — Qu-2)(Pe-a — Ger) (De — Ge) 


Continuing the process of substituting and factoring, we obtain 


P, - 3 = 3(pi a qi) (pe = qe) ent (pe = qx) 


1+ 1 
Pe—-4= 511 @- @ = 5 IL Gr - 0. 


(1) 


We may write the general formula for P; : 


1 k 

() P= [1 + II ep. - 1). 

In the special case where all the p; are equal to a constant, p, 
(3) P, = 3{1 + (2p — 1)'). 


This can also be derived directly by expansion of (p — q)*. 

If any machine, r, in the chain has no bias (p, = 3, exactly), the chain itself 
has no bias, since 2p, — 1 = 0. Note also that if for ail 7,0 < p; < 1, the bias 
of the complete chain is less than the bias of any component (single or multiple) 
taken from the chain, because | (2p: — 1) | < 1. Or stated another way, the 
results taken from any machine, no matter how nearly perfect, can be improved 
by chaining with another machine, no matter how biased the latter. Even in 
the limiting case, p = 1 (or 0), the magnitude of the bias remains unchanged; 
in all other cases it is reduced. The bias of final results can be made as small as 
desired by increasing the length of the chain. Compound randomization can be 
regarded as an attrition process which may be used to reduce final bias below 
any preassigned quantity. If the observations taken from two machines in the 
chain should be perfectly correlated, the only effect is to shorten the chain by 
two. 

In shifting from the binary system to the decimal system, symbol bias will be 
introduced. In general, symbol bias will be introduced in passing from a given 
positional system to any other positional system, unless one of the number bases 
is a rational power of the other. 

To illustrate, let us assume that we have a random binary series and wish to 
obtain a random one-digit decimal series. It will be necessary to tabulate the 
binary series in blocks of four symbols. The quantities will range from 0000 
(binary) to 1111 (binary), or from 00 (decimal) to 15 (decimal), with equal 








84 H. BURKE HORTON 





probabilities. There would be no predominance of either ones or zeros in the 
overall binary tabulation, as illustrated in the table below. 

















Binary System Decimal System 
| 0000 | 0 
0001 1 
0010 2 
OO11 3 
0100 4 
0101 5 
0110 6 
0111 7 
1000 8 
L001 9 








Tabulation to this point 25 zeros One of each symbol 
15 ones 


LOLO 











1011 11 
1100 12 
1101 13 
1110 14 


1111 


(Right digit only) 
Overall tabulation 32 zeros 0-5, 2 each 
32 ones 6-9, 1 each 
















However, if we look at the right digit of the decimal tabulation, it is clear that the 
symbols 0 to 5, inclusive, will occur twice as often as the symbols 6 to 9, inclusive. 
The easiest way of correcting for this bias is simply to reject all two-digit decimal 
numbers which occur, thereby giving equal probabilities to the ten decimal sym- 
bols. The rejection could be accomplished most easily by electronic devices 
operating on the binary numbers. All numbers greater than 1001 (binary) 
would be excluded through the operation of a simple four-stage electronic 
counter. 

This simple illustration also demonstrates the inefficiency of converting ran- 
dom four-digit binary numbers to random one-digit decimal numbers. 37.5% 
of the data are lost in the process of removing bias. A more efficient procedure 
would be to tabulate the random binary series in blocks of ten digits. The 
largest number that could occur would be 1 111 111 111 (binary), or 1,023 (deci- 























ERROR IN INTERPOLATION 85 


mal). The numbers would have equal probabilities insofar as this is attainable 
by chaining. To obtain a random three-digit decimal series it would be neces- 
sary to reject the numbers above 999 (decimal). This would amount to only 
2.34% of the available data. As before, rejection could be accomplished easily 
in the binary series by use of a ten-stage electronic counter. 

Several promising devices are being considered for tabulating random numbers 
in accordance with the principles discussed herein. Electronic or electrical 
systems actuated by cosmic rays seem to be the most desirable. Tabulating 
equipment may be wired to turn out random numbers, possibly as a by-product 
of other card runs. 

If only a few random numbers are needed, they can be obtained by much 
simpler methods. For example, a coin may be tossed, letting heads and tails 
represent +1 and —1, respectively The product of k successive tosses would 
be tabulated as the random binary variable. Products equal to +1 and —1 
would be coded as 1 and 0, respectively. Blocks of binary symbols would then 
be converted to the decimal system as described above. 


REFERENCE 


(1) Tippett, L. H.C., Random Sampling Numbers, Tracts for Computers, No. 15, Cambridge 
University Press, 1927. . 


(a a 


NOTE ON THE ERROR IN INTERPOLATION OF A FUNCTION OF TWO 
INDEPENDENT VARIABLES 


By W. M. Kincarp 
University of Michigan 


Suppose that g is a functon of one real variable zx and h is an interpolation func- 
tion such that g(x) = h(x) for x = 21, %2,-++, an. Let f(x) = g(x) — h(x) 


n 


and suppose that dx" f(x) exists in an interval containing the points 2% ,%1, ++: , 


t,. Then the error in interpolation may be estimated from the well-known 
relation 


(1) fies) = AD (xy — aa) ary — 29) +++ (to — 1) 


where ~ is some point in the smallest interval containing 2 , %1,°°** , 2n. 

In the most usual case, where h(x) is a polynomial of degree less than n, we 
have f™(?) = g(é). 

It is natural to consider the corresponding situation for functions of two inde- 
pendent real variables x and y. Let g and h be two functions such that g(x, y) = 
h(x, y) for n points x = 2; ,y = yi(t = 1,2, --- ,n). Setting f(z, y) = g(z, y) — 
h(x, y) as before, we have f(z; , yi) = Ofori = 1,2,---,n. Then if (x, yo) 

. 





86 W. M. KINCAID 


is a point at which g and h are defined, we may ask whether there is any formula 
corresponding to (1) from which the error f(x , yo) can be estimated. 

Some restrictions must be placed upon the function f if any interesting results 
are to be obtained. Let us suppose that f(z, y) can be expanded in a Taylor 
series about each of the points (x; , yi)(¢ = 0, 1, --- , n) with a region of con- 
vergence sufficient to include all the points of the set. These conditions are more 
stringent ones than will be required for obtaining the later results; on the other 
hand, they would almost always be satisfied in any practical problem of inter- 
polation, so it scarcely seems worthwhile to look for the weakest possible con- 
ditions at this point. 

The first case of real interest ism = 3. It follows from the general statement 
of Taylor’s theorem with the remainder that 


O = f(x: , ys) = f(xo , Yo) + (Xi — Xo) fe(To , Yo) + (Yi — YofylXo , Yo) 
(2) + 3[(2; = ae) fer(E: »m) + Az; — To) (Yi 7 Yo) fay (Ei » 1) 
+ (y; oa Yo) fuyl(és ’ ni)| (t — 1, 2, 3), 


where (£; , 7:) is a point on the line segment joining (xo, yo) and (x; , y;) fort = 
1, 2,3. 

The equation (2) may be regarded as a set of three linear equations in the two 
quantities f(x , yo) and fy(1o , yo). The condition that these shall be consistent is 


| f(to, yo) + Ur m1 — x m1 — Yo | 


(3) | f(to, Yo) + Us ate — 2 
| (xo ’ Yo) + U3; 2% — % 


where 
Us = 3(ai — 20) *fee(Es » 13) + 2(xs — x0) (ys — Yoofex(Ei , ni) + (Ys — Yo) fyv(Es , 23] 
(¢ = 1, 2,3). 


If the three points (7; , y;) (¢ = 1, 2, 3) are not in a straight line, (3) can be 
written in the form 


U1 1 —% Yi- Yo 
Uz 2 —% Y2— Yo 
Us Y3 — Yo 
Yi | 
Y2 | 


Y3 | 


f(xo ? Yo) eee 


This expression is analogous to (1), though far less simple and elegant in form. 
m(m + 1) 


A similar treatment can evidently be used in all cases of the typgn = 5 





ERROR IN INTERPOLATION 


For example, for n = 6 the equation corresponding to (4) is 
Vi ti — 20 Yr — Yo (ty — to)” (1 — 20) (yr — Yo) (yr — Yo)” 
V2 a2 — Xo Y2 — Yo (2 — Xo)” (x2 — to)(y2 — yo) (y2 — yo)” 
Vs 23 — 20 Ys — Yo (ts — Xo)” (as — t0)(Ys — yo) (ys — Yo)” 
Via 24 — 20 Ys — Yo (a4 — 20)” (24 — 2o)(ys — Yo) (ys — Yo)” 
Vi 2 — Yo Ys — Yo (ts — Xo)” (xs — to) (Ys — Yo) (Ys — Yo)” 


(5) flee yo) = — Been se penn ees aa) Ge = ay = we) (oem 
1 22 y} Le Xo Ya 
1 as Y3 Xs 2aYs Y3 
1 w% Yi Xe Las ya 
1 x U5 Vp LsYs5 Yi 
1 2 Ye Xe Tee yf | 
where 
Vi = Bl(ae — 0) fosalEs» mi) + Bes — 0)*(ys — yodfaovlEs » 18) 


+3(x5 — YofewlEi, 7) + (Yi — yo) Suvu(Ess n)) (@@ = 1,2,--- , 6). 


(Equation (5) breaks down only if the six points (21, yi) --- (6, ye) lie on a 
single conic.) 


As an example of the general case we may consider n = 4. We write 
f(x, ys) = f(to , Yo) + (xi — Xo) fala , yo) + (ys — Yo)fy(Xo , Yo) 
+4 (ai — 2o)"fee(Es 01) + 2(xi — ato) (ys — Yofay(Es , 06) 
+ & — yo) Suv(éi » n)] (¢ = 1, 2, 3, 4). 
Now, 
ferléi , ni) = fez(Xo, Yo) + &— Xo) fase(ts ’ ni) + (an — Yo)fesy(ts ? ni) 
where (¢; , ;) is a point on the line segment between (20 , yo) and (&;, 7:). 
Proceeding as before yields 
Wi u—% YW—Y (a — to)” 
We t%2—% Y2— Yo (x2 — 20)” 


Ws 2s Yz — Yo (x3 — ato)” 


| Wa Ys — Yo (% — to)” 
1 Yi 


S(xo , Yo) = — 


T2 Y2 
tz =Y3 


we Ys 








88 KAI-LAI CHUNG 






















with 
W; = 3[(a; ~~ ato) (E; rn to) fexe(€e ’ ns) + (x; —— to) (ni er Yo) fezy(Es ’ n) 
+ (xi — 20) (yi — yoofarl& , 0) + (Yi — Yo) furs , na]. 


Corresponding formulas can be derived in this way for any value of 7; in fact, 
several alternatives may be obtained in each case. In all cases the error f(2o , yo) 
is given in terms of the derivatives of g alone if a polynomial of a certain type is 
used for the interpolating function. For equation (4), the suitable polynomial 
would beh(z ,y) = a + br + cy; for (5), h(x ,y) =a + bx + cy + dz? + exy + fy’; 
for (6), A(z, y) = a + be + cy + dx*. If the interpolating function h(x, y) 
is not so chosen, the formulas remain valid, but derivatives of h will appear. 

The same procedure is applicable to functions of any number of independent 
variables. 


(ne a I a ven | 


ON A LEMMA BY KOLMOGOROFF 


By Kar-Lat CHunG 


Princeton University 






The following lemma was proved by Kolmogoroff [1]: 
If e:, €2,°**, @n are independent events and U an arbitrary event such that 
(W(X) denoting the probability of X and W(X) the conditional probability of X 
under the hypothesis of e) 











W.(U) >u, Wlat--++e,) 2>u 


W(U) = Ww’. 






This result seems of some interest in itself and may also have practical applica- 
tions, for it is easily seen that [2] in general if e:, e2:,--- , én are arbitrary no 
information about W.,+..4¢,(U) can be obtained from that about W.,(U), 
k =1,--:,mn. From this point of view the constant 1/9 is interesting, though 
it is unimportant in Kolmogoroff’s proof of the law of large numbers. Using his 
original method this constant can easily be improved to 1/8. However, the fol- 
lowing method will give a better result. At the same time we shall put it into 
a more general form. 


Let 





W.,(U) = a, > W (ea) = B. 
k=l 







ON A LEMMA BY KOLMOGOROFF 
Then we have for] Sk Sn, 
(1) W(U) < W(U(ey + ++ e;)) = W (Ue, + --- - Ue). 


Now a simple case of certain inequalities due to Bonferroni and Frechet [3] 
states that for arbitray events E,,--- , Ex we have 


(2) Why + +B) = DWE — WEE). 


t=1 1si<jsk 


Applying this to (1), we obtain 


WU) = we) — CO wee) 


t=l 1st<jsk 


=>w (e)W(U) -— Do Wle)W(e), 


l1si<jsk 


using the independence of e:,---,e. Hence 
k 1 k 
W(U) = ad We) — 3 (> Wle d) +4 
t=] i=1 


By Cauchy’s inequality, 


> wre). 
1 


3 ia 


> we) = ‘(> We) 


t=] t=] 


k 
Writing De = >» W(e;), we have 
i=l 


(3) W(U) = | - G ~ x) >| ae 


Now let 0 < y < yo < 1 where y and ¥ are to be determined later. If there is 
ane;,1<i< nsuch that W(e,;) = 76, then 


(4) W(U) = W(Ue:) = Wlei)We(U) 2 yas. 
If every W(e;) < 78, we determine k(> 1) such that 
2-1 < yo S = 
thus 
Le < Lia + VB < (vo + vB. 


W(U) 2 | — (1 - tea + 8 | 08 


And (3) yields 





















90 KAI-LAI CHUNG 
Now we choose y so that the last terms in (4) and (5) be equal. This gives 


1 


1 =  — Yo- 
2a + (: = t) 7yoB 
—_ dy 
To maximize y, we put Fa = 0 and find 


0 
2(/2 — l)a 
te 
B 
If 2(o/2 — 1)a & 8, this choice of yo is admissible, and we obtain 





1 
aa = in - 
. v2 +7 (v2 8D sft x ta 
1 B 
V2 — I (1/2 — 1) 
Thus we get (the first inequality being retained for small values of 7) 


2—V2++(Vv2-1) 


Y 





(6) W(U) 2 - i — 2(/2 — 1)e’ 
V2 - ‘ (V2 - 1) 
> 2(+/2 — 1)*a’ > Sha”. 


'n case 2(4+/2 — 1)a > B, we choose yo = 1, and we obtain 
+ (1 ‘a t)é 
ree are Hy 


Thus we get 





If we write B = na, we have 






(7) 








APPROXIMATE WEIGHTS 91 


We summarize (6) and (7) in the following table: 





B/a = 2(/2 — 1) = <2V2-1 
t re 2 3 2-17 2 
W(U) > 2(V/2 — 1)’a 255," 


Thus for Kolmogoroff’s case (n = 1) we have W(U) = 3a’. 


3 


REFERENCES 


[i] A. Kotmocororr, ‘‘Bemerkungen zu meiner Arbeit ‘Uber die Summen zufalliger 
Gréssen’,” Math. Annalen, Vol. 102 (1929), pp. 434-488. 

[2] K. L. Caune, “On mutually favorable events,” Annals of Math. Stat., Vol. 13 (1942), 
pp. 338-349. 


[3] M. Fr&cuer, Les probabilitiés associeés & un systeme d’événements compatibles et dépen. 
dents, Premiére partie, Hermann, Paris, 1939, p. 59. 


(es eRe eg 


APPROXIMATE WEIGHTS 
By JoHn W. TuKry 


Princeton University 


1. Summary. The greatest fractional increase in variance when a weighted 
mean is calculated with approximate weights is, quite closely, the square of the 
largest fractional error in an individual weight. The average increase will be 
about one-half this amount. 

The use of weights accurate to two significant figures, or even to the nearest 
number of the form: 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 22, 24, 26, 28, 30, 
32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 55, 60, 65, 70, 75, 80, 85, 90, or 95, that is 
to say, of the form 10(1)20(2)50(5)100 x 10” can thus reduce efficiency by at 
most 4 percent, which is negligible in almost all applications. 

2. Proof. Let the optimum weights be W;,7 = 1, 2,---,n, with W; > 0, 
where it is convenient to choose the normalization [W; = 1. Let o° be the 
variance of =W,2;, then the variance of each x; must be o°/W;, and since this 
is a weighted mean, the means of the x; are the same. 

Let the approximate weights be W;(1 + \@;), where O < A < land |6;| < 
l,i = 1,2,---,m. Thus d is the largest fractional error which may be made 
in the situation considered. We need the weak requirement A < 1! The ap- 
proximately weighted mean is 


Do Wil + Oda; _ > w, Le 
> Wil + 9) “1+ 6’ 











92 JOHN W. TUKEY 





where 6 = =W;6;. Its variance is 
2 1 “ rb; , o 
> Ww; ' ~ “) 7, 
>* 
= o'{1 + + gd We - + apap Wee, - ay 
eer 
\ (i+)? J’ 


and, since ©W.6; < 1, this is bounded by 


2 2 1-F \ 
Pe aE oeetend, 
“{ (1 + 10)? 


Now the only maximum of this expression for | 6| < 1 occurs when 6 = —, 
and the bound becomes 


: x2 _ o 
” ( Fi x) am Fs 


This proves the first statement in the summary. 

The greatest fractional change which occurs when a number is approximated 
by one of the form 10(1)20(2)50(5)100 X 10° is 5/105, which occurs, for ex- 
ample, when 10.499999 --- , is replaced by 10. The same estimate applies to 
an approximation to two significant figures. The variance is thus multiplied 
by a factor bounded by 

1+” < 1,0023 
105? — 5? — : 
which proves the second statement. 

The use of a weight of the simpler form 10, 15, 20, 30, 40, 50, 70, times a 
power of ten is seen in the same way to lead to an increase in variance and a 
decrease in efficiency of at most 44 percent. 

3. Comment. It is interesting to compare the 90 possible values for 2 sig- 
nificant figures, the 35 possible values for the numbers proposed above, which 
might be called two curtailed significant figures, and the 24 possible values for 
logarithmic spacing at interval (1.05)", all of which extend over one power of 
ten with the same maximum fractional error in rounding. The use of the cur- 
tailed scheme for critical tables of weights and weighting coefficients would save 
more than 60 percent of the entries needed for two complete significant figures. 

This device applies equally well to other numbers of significant figures. 





| 


ted 


s to 
lied 


Sa 
da 


sig- 
ich 

for 
r of 
‘ur- 
ave 
res. 


i 
| 


USE OF NON-CENTRAL {-DISTRIBUTION 93 


ON THE USE OF THE NON-CENTRAL ?¢-DISTRIBUTION FOR COM- 
PARING PERCENTAGE POINTS OF NORMAL POPULATIONS 


By JoHn E. WAtsH 
Princeton University 


1. Introduction. Consider two normal populations with the same variance 
and means p and » respectively. It is well known that confidence intervals and 
significance tests can be obtained for the difference » — v. Since yu is the 50% 
point of the first population and » is the 50% point of the second population, 
this represents a particular solution of the general problem of obtaining confi- 
dence intervals and significance tests for the difference 6. — yg, where @q is 
the a percent point of the first population and ¢z is the 8 percent point of the 
second population. The purpose of this note is to point out that the results of 
Johnson and Welch [1] for the non-central ¢-distribution can be used to furnish 
a solution of the general problem. 

2. Analysis. Let A, be the y percent point of the normal population with 
zero mean and unit variance (i.e. exactly y% of the population has values less 
than A,). Then if o is the common standard deviation, 

6. = p+ Ago, op = v+ Ago. 
Thus 
62 — os = (u— v) + (Ag — Asie. 

The non-central ¢-distribution investigated by Johnson and Welch in [1] is 

based on the quantity 
t = (2 + d)/Vx/f, 

where z has a normal distribution with zero mean and unit variance, 6 is a con- 
stant, and x’ has a x’-distribution with f degrees of freedom and is distributed 
independently of z. Methods and tables are given in [1] whereby a constant 
i(f, 6, €) can be computed having the property that 

; Pri{t > t(f, 6, 2] = «. 
These relations will be used to obtain confidence intervals for 0. — gg. The 
resulting confidence intervals can be used to obtain significance tests for 02 — 9p. 





Let 21, --- , 2, be a random sample of size n from the first population while 
Y1,°** 5 Ym is a random sample of size m from the second population. Then 
consider 

= — 7 —(0a — 9p) , m+n—2 
n p 2 ™ $ - 1 1 
Xe (a — #)° + Dw — 9) += 


#—-g-—(-—~v)}| (4s—A,) 
fae £v owen B 
a c/i+2 i+ 


/* (x, — #)° + (y; — 9)” 


o2(m + n — 2) 











94 JOHN E. WALSH 


This quantity has a non-central ¢-distribution with 


= (45-40 / 4/242, f=m+ 


For notational simplicity let 


A — Aa — 9 S? 
eo aee Yi - 2) = Si, Ly-g=S. 
i+} 
nr m 


Then one-sided confidence intervals for 6. — vg with confidence coefficient ¢ 
are given by 


tle) VS? + S3 


V m+n—2) /(A41) 


(1-9VS+S 


© Yore—a/E43) 


Two-sided confidence intervals for 62 — vs with confidence coefficient 


1 — (a + @) 


Ga — op <E—- FG — 


‘ 
sl 


0a — op > 


are given by 


me te)VS? + S3 
/m+n—2 /(2 +4) 
n m 


aoe 
<a— 93 <?-—G- (1 ~ aVR + & 


Vintn-n/Q +3) 


where 4 + @ < l. 
REFERENCE 


[1] N. L. Jonson anv B. L. Wetca, “Applications of the non-central ¢-distribution”, 
Biometrika, Vol. 31 (1940), pp. 362-389. 


” 
n”, 


THE TEACHING OF STATISTICS 


A report of the Institute of Mathematical Statistics Committee on the 
Teaching of Statistics! 


PREFATORY NOTE 


This report on the teaching of statistics contains two parts. Part I is a sum- 
mary of the conclusions reached by the committee concerning the appropriate 
content and organization of teaching in statistics. It is oriented towards the 
future, and is intended as a program for action. Part II, mainly the work of the 
chairman of the committee, is a more intensive discussion of the general problem. 
It surveys the present state of the teaching of statistics, probes some of the 
reasons for existing weaknesses in this teaching, and states more fully the basis 
for the conclusions summarized in Part I. 

Additional material, with special reference to applied statistics, is contained 
in a report of The Committee on Applied Mathematical Statistics of the National 
Research Council, entitled Personnel and Training Problems Created by the 
Recent Growth of Applied Statistics in the United States.* 


PART I 
SUMMARY OF CONCLUSIONS 


1. Who are the prospective students of statistics? A complete teaching pro- 
gram in statistics must be designed to meet the needs of four principal categories 
of students, listed here according to the amount of training in statistics that is 
needed to meet their requirements. 

a. All college students. Statistical method is a vital branch of scientific 
method. It is widely used in most sciences, business, government, and ordinary 
life. Some understanding of the nature of inductive inference from quantitative 
data on the basis of the theory of probability as portrayed in statistical method 
is an indispensable part of a liberal education. 

b. Future consumers of statistics. Some students will specialize in adminis- 
tration, business, or other subject-matter that will require them to understand 
the results of statistical analyses of special problems, although they themselves 
do not make these analyses. For example, business executives and government 
administrators must frequently base action on statistical studies. Research 
workers and teachers in many fields may not themselves use statistical methods, 
yet in order to keep abreast of their own or cognate fields they must read and 
understand studies using statistical methods. 

c. Future users of statistical methods. A still smaller group of students of 


1 The Committee consists of Harold Hotelling, Chairman; Walter Bartky, W. Edwards 
Deming, Milton Friedman, and Paul Hoel. 

2 Copies may be obtained from the National Research Council, 2101 Constitution Ave., 
Washington 25. 


95 





96 THE TEACHING OF STATISTICS 


statistics are training themselves for careers of specialization in economics, pop- 
ulation, sociology, housing, business, business research, industrial design, indus- 
trial production, personnel, purchasing, public opinion, biology, agricultural 
science, metallurgy, physics, chemistry, psychology, or some other field that 
makes extensive use of statistics. Research in these fields often requires the 
use of advanced statistical techniques, and even the development of new statisti- 
cal theory. Students planning to do such research need statistical theory and 
methods as a tool. 

d. Future producers and teachers of statistical methods. The smallest, but in 
many respects most crucial group of stvdents of statistics, are those who intend 
to specialize in statistical methods for the sake of statistical methodology. 
Many of these will become teachers or full-time research workers, though some 
will find posts in government and industry in high-grade statistical work, fre- 
quently requiring the development of new statistical theory and methods. 
These students will become tool-makers. 

2. What should they be taught? 

a. All college students.* The fundamental logic and philosophy of statistics 
can be taught at an early stage. It is perhaps an appropriate subject to include 
in the kind of survey courses of physical or social sciences that have become so 
common in recent years. Three or four weeks of lectures and discussions should 
suffice to acquaint the students with the broad principles of inductive inference. 
No mathematics need be included, although some elementary experiments may 
well be performed to instil the concepts of sampling variation, randomness, and 
statistical predictability. The student even at this stage can be made to recog- 
nize the fundamentally statistical character of most decisions, arising from the 
fact that they involve an element of uncertainty and a balancing of the impor- 
tance of different types of errors. The student can be made to understand the 
fundamental difference between inductive and deductive statements, the nature 
of statistical estimation, and the nature of a statistical hypothesis. These 
concepts can be made concrete by illustrating them in terms of problems ranging 
from everyday questions such as whether to cross a street in the middle of the 
block on up to such vital problems as the construction of an appropriate social 
security plan, or the design of an efficient experiment for selecting the best variety 
of corn, or the selection of the best method of testing for the presence of a disease. 

b. Future consumers of statistics. Future consumers of statistics need two 
kinds of training in statistics. First, they need some knowledge of the kind of 
statistical material available in their field of specialization; of the sources of 
such data; and of their limitations. To meet this need they require what may 
be called “descriptive statistics,’ which places special emphasis on their own 
field of specialization. A one-quarter or one-semester course in some depart- 
ment or division (e.g., in the social sciences, or biological sciences) should meet 


3 This recommendation is almost an exact parallel of one made by a committee on the 
teaching of statistics, appointed by the Royal Statistical Society and published by the 
Society in 1947 as a report to the Council; later published in the Journal of the Royal 
Statistical Society, vol. cx, Part I, 1947. 





THE TEACHING OF STATISTICS 97 


this need. In addition, they need a reasonably thorough understanding of what 
statistics can and cannot do, what the major statistical techniques are, and how 
to interpret the results obtained by the application of such techniques. This 
need may be met for those students who have some mathematical background 
by all or part of the fundamental one-year course discussed in the next section. 
For students lacking this background, special courses along similar lines will 
be required. 

c. Future users of statistical methods. It is essential for fruitful application 
that users of statistical methods should not mechanically apply procedures 
learned by rote or taken from a manual. Since few research problems fit per- 
fectly into clearly defined patterns, nothing is so important to the successful 
collection and analysis of statistical data as adaptability and flexibility in using 
techniques. These require a thorough comprehension of the logical foundations 
of statistics, especially of the assumptions underlying its various technical 
devices, and sufficient knowledge of the derivations of these devices to be able to 
adapt them to the special circumstances that inevitably develop. To provide 
this background, a minimum of a full year fundamental course in statistical 
methods is essential, followed by courses of application. It is highly desirable 
that this fundamental course be based on calculus as a prerequisite, because with- 
out it a proper understanding of the development of statistical techniques cannot 
be attained. But this is probably impossible at present, in view of the unfor- 
tunately low level of mathematical training of most college students. As an 
expedient, and it is hoped a temporary expedient, it is recommended that the 
fundamental course be given in two sections, one requiring calculus, the other 
only a knowledge of first-year college algebra. A single course (or pair of courses, 
in line with the temporary expedient just mentioned) should suffice for all depart- 
menis, because the core of statistical methods is common to all fields of study. 
Given in this way, the fundamental course can have the advantage of being 
taught by the most competent statisticians in the institution. 

In addition to a thorough training in theory and methods, users of statistical 
methods need training in applications. This can be provided by courses in 
various applied fields. It is usually advisable that these courses be given in the 
department of application (agriculture, population, engineering, economics, 
psychology, etc.), and require the fundamental one-year course as a prerequisite. 

d. Future research workers and teachers of statistical method. The future 
research workers and teachers of statistical method clearly require far more 
intensive training in theory than has so far been suggested. A fundamental 
prerequisite to such training is knowledge of some advanced mathematics. It 
is difficult to specify exactly what or how much mathematics is necessary, but 
something of the algebra of matrices and of the theory of functions are minimum 
necessities, and a good deal of additional knowledge of algebra, geometry, and 
analysis add richness and power to the work of the statistical theorist. 

In addition to advanced mathematics and advanced work in statistical method, 
the future statistical theorist needs a good deal of work on applications, in the 
form either of experience or courses. He will be a tool-maker, and needs to 











98 THE TEACHING OF STATISTICS 


know by personal experience something of the problems of those who use his 
tools. One satisfactory arrangement is an internship in statistical research, 
as is currently provided by some institutions. By this arrangement, interns 
work under competent leadership in various government or private agencies 
that are engaged in large-scale statistical studies. The interns do research in 
theory, adapt the physical circumstances to theory and vice versa, and have 
actual practice in the design of experiments, the construction of questionnaires, 
writing of instructions, planning tabulations, analyzing the results, and exam- 
ining sampling variances. 

It is obvious that proper advanced courses in statistics will for many years be 
the province of a few institutions only, as there does not exist at present an ade- 
quate professional body to man more than a few. 

3. Who should teach statistics? It is clear from the preceding section that 
two different kinds of courses are required to meet the needs of students of 
statistics: first, courses in statistical method and methodology; and second, 
courses in applications of statistical methods to particular fields. 

The most important requirement for a successful university program in statis- 
tics is that courses in statistical method and methodology should be taught by a 
statistical theorist, a man who has had the training outlined in Art. 2d above, 
is specializing in statistics, is doing research in statistical method, and who has 
had some first-hand acquaintance with applications of statistical techniques. 
This is the only way such courses can be kept abreast of developments and 
sufficiently broad to meet the needs of all departments. This recommendation 
may seem to belabor the obvious, but a glance at the qualifications of most 
people currently teaching statistical methods will show why it is necessary. 

Most courses in applications should be taught by people thoroughly conver- 
sant with the relevant subject-matter fields as well as statistical methodology. 
Some courses in applications may be taught by statistical theorists, particularly 
new applications or applications that are common to many fields. 

4. How should the teaching of statistics be organized? The teaching program 
in statistics should be organized around a separate administrative unit, an Insti- 
tute or Department of Statistics. This department should be primarily respon- 
sible for the teaching of courses in statistical methods: the fundamental course 
in statistical method described above, specialized methods for particular fields 
of application (e.g., factor analysis, time-series analysis), and advanced courses 
in statistical theory. 

In addition, the department of statistics should offer its services as a consulting 
centre on problems in statistics arising in research in other departments of the 
institution, both as a service to these other departments and because research 
in statistical methods peculiarly requires stimulation from close communication 
with applications. The department of statistics might also provide laboratory 
facilities for itself and other departments,‘ and might undertake directly, or 


4 See the interesting suggestions on this point on p. 14 in Personnel and Training Prob- 
lems, loc. cit. 





se 


| 
| 


THE TEACHING OF STATISTICS 99 


through an associated research staff, special assignments involving the applica- 
tion of statistical methods to concrete problems. 

Intermediate courses dealing primarily with applications ordinarily belong in 
other departments (agriculture, economics, demography, engineering, biology, 
etc.), although some may be given in the department of statistics. The exact 
location of courses in application will depend on the accident of the depart- 
mental affiliation of the persons competent to teach them. Coordination of the 
teaching program in statistics can be achieved by an interdepartmental com- 
mittee. The department of statistics should not, however, consist of such a 
committee under a different name. It should be a thoroughly independent de- 
partment, with all or most of its members entirely in the department. 

The recommendation that the responsibility for teaching statistical methods be 
centered in a separate department is based on the belief that the teaching of 
statistical methods without theory can only be uninspiring and harmful; that a 
separate department of statistics offers the only arrangement that can assure 
statistical theory being taught by competent theorists, and the only satisfactory 
arrangement for ensuring the strong incentive for statistical research, with appro- 
priate recognition and advancement, which is as necessary for the teaching of 
statistics as for the teaching of any other subject. 

5. What should be done about adult education? The preceding recommenda- 
tions are all directed toward the teaching of statistics to undergraduate and 
graduate students. There is an additional need that these do not meet, namely, 
the provision of training to mature research workers in various fields already 
established in their professions. This need arises in part from the inadequate 
teaching of statistics in the past, but even more from the extremely rapid advance 
in the theory and practice of statistics which have made it difficult for any but 
the specialist to keep abreast of developments. Some institutions are making 
efforts to meet this need by providing evening and late-afternoon classes for 
employed research workers. Such classes are feasible only in the larger centres 
of statistical activity. There is also the need of providing advanced research 
workers in particular fields with highly specialized guidance in selected topics. 
A department of statistics organized along the lines suggested above can con- 
tribute toward meeting this need by effective counseling of colleagues in other 
departments, and by organizing special seminars and lectures for them. The 
professional statistical associations are also contributing by arranging special 
expository programs. 











100 THE TEACHING OF STATISTICS 


PART II 
THE PLACE OF STATISTICS IN THE UNIVERSITY5 


Contents 


A. Minor nuisances and inefficiencies in statistical teaching 
6. Lack of coordination among departments. Lack of advanced courses and labora. 
tory facilities 
7. Inefficient decentralization of libraries 
B. The major evil: failure to recognize the statistical method as a science, requiring spe- 
cialists to teach it 
8. Too many teachers not specialists 
9. Results: students ill equipped 
10. Reasons why teachers of statistics are often not specialists 
a. The rapid growth of the subject 
b. Confusion between the statistical method and applied statistics 
c. Failure to recognize the need for continuing research 
d. The system of making appointments to teach statistics within particular 
departments that are devoted primarily to other subjects 
11. Appointments under the existing system are not all bad 
12. Unsatisfactory texts 
13. Omission of probability theory from texts and teaching 
C. Proper qualifications of teachers of statistics 
14. Statistics compared with other subjects 
15. Current research in the statistical method is essential for teachers 
16. Minimum requirements in mathematics for the training of teachers and research 
men in statistical theory 
D. Need for relating theory with applied statistics 
17. An example of the interaction between theory and practice 
18. Supplying opportunities for application in graduate studies of statistics 
E. Recommendations on the organization of statistical teaching and research in institu- 
tions of higher learning 
19. Research should be encouraged; teaching schedules should not be overloaded 
20. Organization of statistical service in the university 
21. Organization for teaching 
22. The statistical curriculum 
23. Statistical method as part of a liberal education 


A. MINOR NUISANCES AND INEFFICIENCIES IN STATISTICAL TEACHING 


6. Lack of coordination among departments. Lack of advanced courses and 
laboratory facilities. The teaching of statistics in American colleges and uni- 
versities, which has for the most part been a development since the first world 
war and has now reached large proportions, presents a number of unsatisfactory 
features. Courses in statistical methods are taught in various departments, 
without coordination or inter-communication. These courses cover what is to 
a large extent the same material, but with many variations in the selection of 
subjects according to the ideas and abilities of individual instructors, and with 


5 An earlier version of this part, prepared entirely by the chairman, is being published 
by the University of California Press in a report of a symposium on probability and sta- 
tistics. The Committee as a whole made and adopted the present condensation, with W. 
Edwards Deming and Milton Friedman contributing most of it. Publication of the 
Berkeley symposium, including the more detailed original, has been delayed, but it is 
expected to appear soon. 


Re 


\i- 


ld 


ts, 


to 
of 
th 


a- 
W. 
he 


THE TEACHING OF STATISTICS 101 


illustrative examples drawn in each case from material pertaining to the depart- 
ment in which the course is taught. Thus a student desiring to learn more about 
statistics than he can obtain in one department must, in taking courses in other 
departments, repeat a great deal of what he has previously covered. 

There is a plethora of elementary courses and a dearth of advanced ones. 
Some departments have excellent statistical laboratories which they reserve for 
the use of their own students, each with an attendant tc keep others away, while 
other departments have none. Some classes in elementary statistics are too 
large and some too small, with no one in a position to equalize the sections be- 
tween different departments. 

7. Inefficient decentralization of libraries. The library situation is confused. 
Books on statistical methods are catalogued and shelved under Sociology, 
Economics, Business, Psychology, Zoology, Botany, Engineering, and Medicine. 
Books on probability are divided between Philosophy, Mathematics, Physics, 
and Chemistry. Books on the method of least squares are for the most part 
divided between Mathematics, Astronomy, and Civil Engineering, though some 
get into the Economics, Geology, and Physics reading-rooms. Works on the 
analysis of variance and design of experiments are likely to be concentrated under 
Agriculture, while methods of approximate evaluation of multiple integrals and 
similar purely mathematical subjects of use in statistics are, at least in one of our 
largest universities, to be found only in the library of Biology. 


B. THE MAJOR EVIL: FAILURE TO RECOGNIZE STATISTICAL METHOD AS A 
SCIENCE, REQUIRING SPECIALISTS TO TEACH IT 


8. Too many teachers not specialists. The above nuisances are but minor. 
The major evil is that those attempting to teach statistical method are all too 
often not specialists in the subject. Their original selection was seldom on the 
basis of scholarship in this field; they are not encouraged to make advanced 
studies in it; and their environment is such as to draw their attention in every 
direction except to the central truths and problems of their science. Frequently 
they lack the knowledge of mathematics necessary to begin to read the more 
serious literature of the subject that they are teaching. Many have been utterly 
unable to keep up with the rapid progress which has been taking place in statisti- 
cal methods and theory, progress which affects even the most elementary things to 
be taught. 

9. Results: students ill equipped. There results a widespread teaching of 
wrong theories and inefficient methods. Students are sent to the government 
service and to industrial and commercial statistical positions equipped with the 
skill that results from careful drilling in methods that ought never to be used. 
Some of these same students are encouraged and assisted to become college and 
university teachers of statistics without ever making thorough-going studies of 
the fundamentals of the subject, or exhibiting any power of making original con- 
tributions to it, or studying any graduate mathematics. Through the method of 
selection of teachers in general use, and through textbooks written by individ- 
uals of this type, there is a perpetuation of obsolete ideas and unsound methods. 

All this does not mean that any considerable number of people teaching statis- 





102 THE TEACHING OF STATISTICS 


tics are unworthy or objectionable members of the academic community. Many, 
indeed, are of superior intellect, upright character, personal charm, and un- 
doubted teaching ability. Some are making creative contributions to other sub- 
jects. The only trouble is that they are teaching a subject in which they are not 
specialists, and which progresses so fast that only specialists can keep up with it. 

10. Reasons why teachers of statistics are often not specialists. The chief 
reasons for the extensive teaching of statistical method by people who are not 
specialists in it appear to be the following: 

a. The rapid growth of the subject and multiplication of its applications, creating 
a very large and very urgent demand for teaching it that could not be met im- 
mediately by the small existing number of scholars specializing in statistical 
method. This difficulty is aggravated by the paucity of university facilities 
for training advanced scholars in the field, so that even now the available number 
of such scholars cannot be expanded with sufficient rapidity to meet the current 
need. As specialists have not been available in anything like sufficient numbers, 
statistical method has inevitably been taught largely by non-specialists. 

b. Confusion between statistical method and applied statistics. Statistical 
method is a coherent, unified science. ‘Applied statistics’ may mean any of 
thousands of diverse things. Any particular study in applied statistics will 
ordinarily utilize some few of the results obtained by the science of statistical 
method, but will be largely concerned with matters peculiar to the particular 
application in view and others closely related to it. For example, studies of 
business cycles utilize statistical methods, good or bad, with a view to drawing 
inferences from existing data on prices, production, incomes, interest rates, bank 
reserves and the like. The main job of the applied statistician in this field is to 
study the sources and nature of the various series of observations, keeping in 
mind incidental events which may break the continuity of a series, and watching, 
with a background of economic theory and knowledge of the facts, for explana- 
tions. He should also be well acquainted with statistical theory, sirice other- 
wise there is grave danger of wasting or misinterpreting the laboriously accumu- 
lated observations. Indeed, an organization studying business cycles, or solar 
cycles, or rat psychology or cancer or practically anything else, would almost 
certainly benefit from participation by a specialist in statistical method. 

However, the chief attention in any such study will not be on statistical method 
but on features peculiar to its own scope. The specialist in statistical method 
will do well to participate occasionally in such a study, but if he does so too ex- 
tensively the needs of the application will so engross his attention that he cannot 
keep up with the progress of statistical method itself. 

The call of applications is enticing, and has led many young scholars to forsake 
the cultivation of statistical theory. The applications have benefited greatly 
by the process. Moreover, problems brought back in this way from applica- 
tions have provided valuable inspiration in developing theory. The mistake 
lies in supposing that participation in applied statistics is equivalent to specializa- 
tion in statistical method and theory, and the consequent appointment to teach 
the latter of persons whose sole concern is with the former. 





THE TEACHING OF STATISTICS . 103 


c. Failure to recognize the need for continuing research in the theory of statistics 
by those who teach it. There is an easy tendency to assume that all the requisite 
ideas and formulae can be found in some book, and that the duty of the teacher 
of statistics is simply to transfer this established book-knowledge to the minds 
of the students and impart to them skill in applying it. Similar attitudes ap- 
plied to other subjects have in the past been a drag on progress, and have long 
been discarded in respectable universities. They still hang on, however, even 
in the best institutions with respect to statistics. The spectacular advances of 
the last three decades in statistics should make it clear to anyone who has followed 
them that statistical method is far from static, that the best techniques of present- 
day statistics may tomorrow be replaced by something better, and that un- 
solved problems regarding the theory and methods of statistics are sticking out 
in every direction. A vast amount of research, mostly of a highly mathematical 
character, is needed and is in prospect. Anyone who does not keep in active 
touch with this research will after a short time not be a suitable teacher of statis- 
tics. Unfortunately, too many people like to do their statistical work as they 
say their prayers—merely substitute in a formula found in a highly respected 
book written a long time ago. 

d. The system of making appointments to teach statistics within particular depart- 
ments that are devoted primarily to other subjects. In effect, the teacher of statisti- 
cal method is too often selected by economists or sociologists or engineers or 
psychologists or medical men because he is to teach in one of these departments. 
Thus the task of selection devolves upon people unacquainted with the subject, 
though realizing the need for it in connection with a very specific application. 
Under such conditions there is an inevitable tendency to emphasize the immed- 
iately practical and specific at the expense of the fundamental work of wider 
applicability and greater long-run importance. Confusion between a science and 
its applications is most pronounced with those who know little about it, and the 
distinction between statistical method and applied statistics is likely to be com- 
pletely lost when a sociologist or an engineer is confronted with the problem of 
finding someone to teach statistics. If he does make the distinction at all he is 
likely to choose in favor of applied statistics. 

Strangely, the actual teaching that ensues is bound to consist largely of sta- 
tistical theory, because the students will ordinarily not have had statistical theory 
elsewhere, and they must have some in order to apply it. What often happens is 
that a sociologist or an engineer who has made some study of statistics embarks 
on what he thinks will be a career of teaching the application of statistical method 
to sociological or engineering problems, only to discover that because of the 
ignorance of the students he is compelled to teach the fundamentals of statistics, 
an entirely different subject for which he lacks preparation, talent, and interest. 

An incident of this sort has been cited previously. A prominent economist 
was asked to teach a course entitled “Price forecasting” in a leading university, 
and accepted. He found, however, that his lectures on this subject were over 


® Harold Hotelling, ‘‘The teaching of statistics.’”” Annals of Math. Stat., vol. xi, 1940, 
pp. 457-470. 











104 THE TEACHING OF STATISTICS 


the heads of the students because he was using statistical concepts unfamiliar to 
them. He therefore went back over the ground covered so as to explain these 
particular statistical concepts along with their application. But in explaining 
them he found himself using other statistical concepts, which in turn called for 
explanation. At the end of the semester he found that he had not given the 
course in price forecasting which he had planned, and for which the large class had 
enrolled, but instead had taught a somewhat disordered course in elementary 
statistics, a subject in which he did not feel particularly competent, and for which 
the students had not come. When he was asked to teach price forecasting a year 
later he proposed that a prerequisite of a course in statistics be imposed, but this 
proposal was rejected by the chairman of the department, and the course was not 
repeated. 

11. Appointments under the existing system are not all bad. More by acci- 
dent than by design in the existing system, not all statistical appointments by 
departments of application are bad. Some professors in these departments make 
conscientious excursions into statistical theory, are well advised by competent 
specialists in statistics, and bring about the appointment of men of high quality 
well acquainted with statistical method and theory of the currently best sort. 
This may work out well if the man so appointed is an able and energetic scholar 
deeply devoted to his subject, if he is placed immediately in the highest pro- 
fessorial rank, and if he does not feel under obligation to devote himself too ex- 
clusively to the special interests of the department. of which he finds himself a 
member. He is then free to pursue his specialty, to keep informed on the latest 
developments in statistical method and himself to add to the subject, while at 
the same time transmitting to students a well rounded and up-to-date selection 
of knowledge. It is in this way that some of the present leaders in statistics have 
developed. It is a wrong procedure, however, to depend on accidents of this 
kind. 

The system of departmental organization and of making appointments and 
recognizing proficiency in the teaching of statistics needs to be altered. The 
usual story is typified by the appointment of a promising young scholar in sta- 
tistical method to a junior position in some department of application where he 
is expected to work on problems and to teach statistical methods with a sole eye 
to the work of the specific department. He is then under pressure to concentrate 
on a particular kind of applied statistics, for his advancement will depend, not 
on his statistical attainments at all, but on his study of the literature, termin- 
ology, techniques and theories of the application. His usual associates will be 
in the department in which he is teaching rather than others teaching statistics. 
The loss, although not total, is great, because the opportunity to make the most 
of the man’s statistical ability is lost, and his ability as an economist, agricultural 
scientist, engineer, or something else that he is not particularly fitted for, is 
substituted. 

A still less favorable circumstance, and unfortunately more common, is that in 
which the teacher of statistics is not even selected for scholarship in the theory 
of statistics. Studies in some other field, with some slight dabbling in the appli- 





od 


THE TEACHING OF STATISTICS 105 


cation of statistical methods to it, plus a pleasing personality, have all too fre- 
quently been thought to comprise sufficient qualifications for teaching statistical 
methods and theory. 

12. Unsatisfactory texts. The uncritical character of the teaching is reflected 
in the long line of textbooks written by teachers who have not made any gen- 
uinely fundamental study of statistics, but pass on to students in a magisterial 
fashion what was passed on to them. Authority takes the place of derivations 
and ultimate sources. It is no wonder that these textbooks, copied from each 
other, contain increasing accumulations of errors; or that long delays have inter- 
vened between the introduction of important new statistical methods and theories 
in the periodical literature and their appearance in the textbooks and courses 
put before students. 

The latest discoveries in the theory of statistics affect what should be taught 
in elementary courses, and no syllabus can be expected to survive more than a 
few years of research. The development of new statistical methods and ideas 
of overwhelming importance must be allowed to compete with material already 
well established as true and useful. The new material is equally true and in 
some cases even more useful than matter usually incorporated in the best of 
current courses and textbooks. 

13. Omission of probability theory from texts and teaching. One of the im- 
portant weaknesses in much of the current teaching of statistics is a failure to 
make proper use of the theory of probability. Without probability theory, sta- 
tistical methods are of only minor value, for although they may put data into 
forms from which intuitive inferences are easy, such inferences are very likely to 
be incorrect. The objective weighing of the degree of confidence to be placed in 
inductive conclusions is necessary to avoid fallacies. Indeed, the whole founda- 
tion of descriptive statistical methods, of inductive inference, and of the design 
of experiments, rests upon probability theory. 

The relevance of probability to much statistical work was indeed questioned a 
quarter-century ago by a group of economists impressed by the lack of independ- 
ence between consecutive observations, and this attitude, in conjunction with 
an exaggerated and belated remnant of nineteenth-century empiricism, has had 
a certain influence, particularly on the statistical methods in use by economists. 
This view is now rapidly giving way to a tendency to use the powerful new sta- 
tistical methods discovered in the meantime. It is now perceived that efficient 
objective methods can be used over a much wider range of cases than was formerly 
supposed, because the independence assumed in their derivations refers not to 
observations but to residuals from the theoretical model used. Furthermore, 
research is under way, and has already achieved promising results, on the exten- 
sion of accurate methods to still more extensive classes of problems. 


C. PROPER QUALIFICATIONS OF TEACHERS OF STATISTICS 


14. Statistics compared with other subjects. The qualifications appropriate 
for teachers of statistical method and theory are not essentially different in degree 
from those for teachers of other subjects in the same institutions; proficiency in 











106 THE TEACHING OF STATISTICS 


statistical method and theory is merely to be substituted for it in other subjects, 
This substitution is, however, vital. It must not be imagined that proficiency 
in some other subject in which statistical methods are used incidentally is equiv- 
alent to proficiency in statistical method itself. The error of such a supposition, 
if carried over into another field, might lead to the appointment of a man as pro- 
fessor of chemistry on the ground that he could cook. 

The first requisite of the college or university professor of any subject is a pro- 
found and thorough knowledge of that subject. It is customary in the better 
institutions at least to restrict appointments to the rank of assistant professor to 
persons who have demonstrated scholarly qualifications by work equivalent to 
that leading to a Ph.D. degree, including an original contribution to the subject 
that the individual is to teach. Promotion to the higher ranks is conditioned 
upon a number of criteria, among which published research is by far the most 
important in those institutions. 

15. Current research in statistical method is essential for teachers. Research 
is even more essential in the teacher of statistics than in teachers of most other 
subjects, because so much remains to be worked out that is of immediate impor- 
tance. Some college teachers do no research. This is usually regarded as de- 
plorable. The evil is, however, of quite different magnitude according to the 
nature of what is taught by such teachers. In a new subject.in which sharp 
differences of opinion exist or have recently existed on fundamental questions, 
in which current discoveries have an important bearing, and in which there have 
not yet been the time and consensus necessary for the preparation of an adequate 
and virtually error-free textbook, teaching without research may have calamitous 
effects. The effective teacher must, of course, have teaching ability, but no 
skill in pedagogy, no lustre of personality, can atone for teaching errors instead 
of truth. Errors are very likely to be taught by those who do no research, and 
then the more skillful the pedagogic indoctrination, the greater the harm. 
Sound educational policy calls for devotion to research of a large fraction of the 
time and energies of the teaching staff in a subject like statistical theory. Stu- 
dents also are in particular need of encouragement to do original and critical 
work in relatively new areas of this kind. They must be taught to shun the 
use of formulae and methods given merely on authority without full and con- 
vincing reasons, and to insist on looking closely and critically at assertions. 

Even in the teaching of elementary statistical methods for direct practical 
use by specific occupational groups, where it might be thought that the teaching 
would most predominate over the research element, the teacher must face diffi- 
cult questions whose answers call for research in statistical theory. Let us 
illustrate this by one example out of the many possible. In teaching the analysis 
of variance for use in agricultural experimentation, questions arising out of the 
possible non-normality of the underlying distributions must be dealt with in 
some way. The formulae, even those in the best textbooks, are accurate only 
if the distribution is normal, and neither this fact nor the non-normality of many 
distributions should be concealed from the students. Obviously something more 





yr- 
le- 


THE TEACHING OF STATISTICS 107 


needs to be said on the subject at this point. What the teacher can say depends 
on how deep he has gone into a whole series of perplexing questions, on some of 
which the views of scholars are not yet stabilized, and on which a tremendous 
amount of research is needed before the maximum practical value can be attained 
for a technique whose usefulness is already amazing. 

16. Minimum requirements in mathematics for the training of teachers 
and research men in statistical theory. Because research in the theory of statis- 
tics requires advanced mathematics, and is indeed largely mathematical in 
character, a mastery of a substantial amount of higher mathematics must be an 
essential part of the training of prospective teachers of statistics. To specify 
exactly what or how much mathematics is necessary would be a difficult task. 
Something of the algebra of matrices and of the theory of functions are minimum 
necessities, and a good deal of additional knowledge of algebra, geometry, and 
analysis add richness and power to the work of the statistical theorist, the in- 
ventor of new statistical methods. On the other hand, the time of the graduate 
student in statistics is much occupied with the theory of statistics itself ; and some 
of his time should also go into the study of applied statistics. If the students 
entering a graduate school for advanced work in statistics went there equipped 
with a knowledge of matrix algebra and theory of functions and some additional 
higher mathematics, as is obtainable by undergraduates at some institutions, 
they would have time for applied statistics and could do some real work on 
applications. 

There is a cruel dilemma here, resulting from the delay in learning mathematics 
imposed by the elementary curricula which have become customary in this coun- 
try. The weakness of the mathematical element in the prevailing curricula 
affects both teachers and students of statistics to an extent justifying some atten- 
tion from those interested in the improvement of statistics. In American uni- 
versities elementary calculus is not often taught before the sophomore year, and 
the more advanced parts of algebra come still later, if at all. 

If calculus could be pushed down into the high schools and assumed as a pre- 
requisite for college courses in mathematics, statistics, economics, physics and 
several other subjects, the efficiency of instruction in all these departments could 
be increased. For example the difficulties experienced by students of economics 
with ideas of marginal cost, marginal revenue and the like correspond closely 
with the difficulties experienced by mathematicians for centuries in trying to 
define infinitesimals and derivatives, but now successfully overcome. The 
student who really knows differential calculus need not have the slightest diffi- 
culty with the marginal ideas of economics. Similarly in physics, the funda- 
mental concepts of speed, acceleration, potential theory, conductivity, thermal 
capacity and radiation, are all mathematical and easier to grasp once and for 
all as such than to be learned afresh with each new application from textbooks 
in physics sometimes not clearly written and taught by teachers who must for 
one reason or another avoid a mathematical approach. 

The possibilities of teaching quite advanced mathematics to young children 











108 THE TEACHING OF STATISTICS 


have scarcely begun to be explored. Children of kindergarten age are fascinated 
and thrilled by the wonders of topology. Groups and number theory can be 
tremendous sensations in the fifth grade, though all these subjects are ordinarily 
reserved for graduate students specializing in mathematics. What is lacking 
is teachers who know mathematics and its applications and who possess enough 
freedom to teach what they know instead of the long, dull and relatively useless 
drill on problems of wallpaper hanging and the like, problems turning on mere 
conventions which are quickly forgotten, painful repetitious work which makes 
children resolve to quit mathematics as soon as possible. 


D. NEED FOR RELATING THEORY WITH APPLIED STATISTICS 


17. An example of the interaction between theory and practice. A professor 
of psychology working with mental tests might enlist the assistance of a young 
statistical theorist with mutual benefit. The young man might for a short time 
do some of the drudgery of scoring tests and computing, passing on soon to the 
problems of test construction and the distribution of various functions of cor- 
relation coefficients. This last is on a new and exciting frontier of statistical 
theory. The advancement of this frontier, which is really the main business of 
the young man in his capacity as prospective statistical theorist, would in this 
way come to him naturally as a problem or series of problems having a tangible 
meaning additional to its mathematical content. The empirical context is in 
such cases often of great value in suggesting suitable approaches, for example, 
suitable approximations in the study of functions not susceptible to simple 
mathematical representation in terms of elementary functions. 

If the young theorist succeeds in extending the boundaries of multivariate 
statistical analysis by discovering the distribution of some new function of cor- 
relation coefficients, the chances are that this discovery will also have applica- 
tions in anthropology, medicine, banking, and other pursuits which in the aggre- 
gate will greatly outweigh the application originally in view. 

The discovery should be regarded primarily as a contribution to the general 
theory of statistics, and published in a journal devoted to mathematical statistics. 
It will then become available to a wide circle of teachers of statistics, who may 
incorporate it into their courses, and its methods and results will be studied by 
other investigators from the standpoint of possible generalizations and analogs. 
The importance of the discovery would be much more limited if it were thought 
of as a development in psychology and published only in a psychological journal. 
Perhaps dual or multiple publication ought to be permitted in such cases, but the 
first publication should be in a journal of mathematical statistics. Far too many 
good statistical ideas have been buried in connexion with obscure special applica- 
tions. 

18. Supplying opportunities for applications in graduate studies of statistics. 
The statistician who does any work in applications must know statistics as an 
art as well as a science. The theoretical statistician, if he wishes to be of the 
utmost use to his colleagues in other disciplines, needs to know by personal 





Ro BR x“ VS Ss CF — 


S. 
n 
1e 
al 


THE TEACHING OF STATISTICS 109 


experience something of their lives and collateral problems. Indeed, experience 
with applications, and the challenge of problems arising out of applications, have 
played a most important part in the development of statistical theory. It 
follows that the graduate student in statistics needs contact with applied statis- 
tics which the institution should undertake to provide, or at least facilitate. 
This need is next in importance after the needs for theoretical statistics and for 
pure mathematics. The distribution of time among the three—theoretical 
statistics, mathematics, and applied statistics—-is hard to specify exactly, and 
must in any case depend on the nature of the student’s previous work. If his 
mathematical preparation has been full and rich, more time should be spent on 
applied statistics in his graduate years than if he has already had substantial 
contact with applied statistics in some other way but is deficient in higher 
mathematics. 

Applied statistics entails a somewhat detailed acquaintance with the field of 
application. Such a field might be life insurance, or mental testing, or industrial 
quality control, or sampling in the work of the Bureau of the Census or some 
other government agency; it might be agricultural economics, or business cycles. 
Proficiency in any such field calls for rather prolonged study, and it would be too 
much to expect the embryo statistical theorist to reach this stage of advance- 
ment in all subjects. He should, however, make more than a superficial study 
of some chosen field of application. This study might or might not be at the 
university. The requisite familiarity with applied statistics might in some cases 
be acquired by work in a government bureau, or in a research organization study- 
ing business cycles or something else involving applied statistics. What is most 
desirable is that the work should have brought the student to the point both of 
applying statistical methods in a reasonably effective way, and of perceiving the 
limitations of existing statistical methods. Perception of existing limitations 
has frequently been the germ of progress in the subject. 

Ore satisfactory arrangement is an internship in statistical research, as is 
currently provided by some institutions. By this arrangement, interns work 
under competent leadership in various government or private agencies that are 
engaged in large-scale statistical studies. The interns do research in theory, 
adapt the physical circumstances to theory and vice versa, and have actual 
practice in the design of experiments, construction of questionnaires, writing 
of instructions and tabulation plans, analysis of the results and appraisal of 
sampling variances. 


E. RECOMMENDATIONS ON THE ORGANIZATION OF STATISTICAL TEACHING 
AND RESEARCH IN INSTITUTIONS OF HIGHER LEARNING 


19. Research should be encouraged ; teaching schedules should not be over- 
loaded. Colleges and universities usually expect the members of their faculties 
to engage in research as well as in teaching, the relative emphasis on these two 
functions varying greatly from institution to institution and to a lesser extent 
among departments within the same institution. Reasons why teachers of 











110 THE TEACHING OF STATISTICS 


statistics must do current research in order to teach the subject have already 
been given in Art. 15. In the organization of statistical teaching it is thus of 
extraordinary importance that colleges and universities emphasize research in 
the theory of statistics as a leading part of the work of the teaching staff in this 
field. Hours of teaching and other duties must be kept within such bounds as. 
to make research possible, the initial selection of teachers must be of persons 
capable of research in statistics, and there must be provision of needed secretarial, 
computational and other assistance. The library must be adequate, not only in 
publications containing statistical theory, but in the larger field of pure mathemat- 
ics as well. 

20. Organizing statistical service in the university. In addition to the cus- 
tomary duties of teaching and research, faculty members expert in statistical 
methods find that they cannot escape a third, viz., advice to their colleagues and. 
others regarding the statistical aspects of their problems. This often takes a 
good deal of time. Clearly it is in the interest of the academic enterprise that 
such services be provided. Scholars in many departments are finding that their 
work is greatly improved by competent statistical advice not only in the inter- 
pretation of their data but also in the design of their experiments and other 
investigations. The provision of competent advice frequently requires extended 
consideration of the general content of the problein as well as special analysis of 
its statistical features. And initial advice often needs to be supplemented by 
further service. The statistician, like the physician, often finds that one inter- 
view at which a prescription is dispensed does not end the matter satisfactorily. 

Teaching hours must be distinctly limited if statisticians are to be able to 
render this service to the rest of the institution as well as maintain a high level 
of research in their own field. 

One way to handle the problem of statistical service, especially in a large 
institution, is through a special organization devoted to this purpose. Such an 
organization, whether called a Statistical Institute, a Department of Applied 
Statistics, Statistical Laboratory, or something else, might supply not only 
advice but a more active kind of assistance, including computational and chart- 
drawing services. 

A statistical service organization should be removed from the teaching of statis- 
tics only to the extent necessary to gain the advantages of some degree of special- 
ization and to prevent undue interruption of the teacher’s other work of teaching 
and of research in theory. There are distinct advantages for all parties in a 
fairly close connexion between practical statistical work, research in statistical 
theory, and statistical teaching. Each of these activities benefits the others, 
provided only that it does not take away from it too much time. Research in 
statistical theory, like medical research, needs frequent revitalizing injections of 
specific practical problems. It also needs the stimulus of contact with students. 
The teaching of statistical method is made more vigorous both by research in 
the subject and by the presence of applications with which students can be con- 
fronted. And the needs of applications are better met if through an organiza- 





THE TEACHING OF STATISTICS 111 


tion such as is here envisaged they can be brought to the attention of appropriate 
specialists, and if also students can be enlisted when needed for their treatment. 

A university organization dealing with statistics may properly comprise two 
parts with overlapping personnel, one devoted chiefly to applied statistics, the 
other to theoretical statistics. The teaching might be done by both, but at 
least at the more advanced levels would be primarily the concern of the theoreti- 
cal part. Migration between the two ought to be easy and frequent, though some 
individuals are so definitely adapted to one kind of work or the other as to make 
it undesirable to have fixed rules calling for periodic transfers. 

In smaller institutions it may not be practicable to have statistical organiza- 
tions sufficiently well staffed to provide adequate consulting service. To meet 
the needs in some of these cases regional centres for advice and service in applied 
statistics might be established at large universities throughout the country, 
with access made readily available for sister institutions. These centres might 
also carry on work in applied statistics in behalf of government agencies and other 
organizations, much as various agricultural colleges have for years been carrying 
on cooperative work with the federal Department of Agriculture. 

The question how far, if at all, such a university centre of applied statistics 
should go into the market place and engage commercially in service to business 
concerns is a debatable one. While there may be favorable reactions upon 
scientific work, there are grave dangers to the intellectual integrity of the in- 
stitution which need serious consideration. 

21. Organization for teaching. Passing from questions of personnel and the 
research and service functions of academic statisticians to teaching itself, we have 
to consider problems of departmental organization, of course contents, of systems 
of prerequisites, and of methods of teaching. All these we consider secondary 
problems, not in the sense of being unimportant, but because we believe that 
proper solutions of them will be reached with reasonable promptness when 
personnel of the kind described in Sec. C of this report are at work in some such 
general setting as has just been described. The ideas recorded below are general 
in character and are to be regarded as a starting-point for developing a program 
in a particular institution, once suitable faculty members have been obtained. 

The teaching of statistics may be organized in any of the following ways: 

a. In a department of theory and a department of applied statistics, both 

forming an Institute of Statistics. 

b. In a single Department of Statistics. 

c. Under an inter-departmental committee. 

d. Under the exclusive jurisdiction of the Department of Mathematics. 

e. It may be scattered among heterogeneous departments of application, 

without formal coordination. 

Only a few large institutions will be in position to adopt the first plan. It is 
likely that the second will be most suitable for the majority. The third should 
probably be regarded as a makeshift for the transitional period until a proper 
department of statistics can be organized, a step that will not at the moment be 








112 THE TEACHING OF STATISTICS 


reasonably possible for most institutions because the right kind of scholarly 
personnel does not exist in adequatenumbers. It is of course possible that some 
vestige of an inter-departmental committee, perhaps in the form of an Advisory 
Board, might be a useful adjunct of a department of statistics in order to keep it 
informed of the needs of applications. It is also possible that something of the 
sort might function with respect to a department of mathematics, or any other 
department. On the other hand, the desired consultations and adjustments 
might be accomplished in less formal ways. 

To make statistics a subdivision of a mathematics department is a solution 
that will appeal to administrators desirous of keeping down the number of de- 
partments. The subject-matter of statistics is to a sufficient extent mathemati- 
cal to give some apparent weight to this plan, and some mathematicians have the 
unsound idea that any mathematician can teach statistics without specialized 
study or experience in application. On the other hand, statistics has some 
features uncongenial to traditional mathematics, arising partly from the urgency 
of practical needs which go beyond what can immediately be provided by rigor- 
ous mathematical theory. Again we may cite the problem in the teaching of the 
analysis of variance of what to do about possible non-normality of the underlying 
distribution (Art. 15). The user of this technique has the responsibility of 
verifying that the situation conforms to the assumptions, including that of nor- 
mality, underlying the tabulated probability criteria. But he is in a very poor 
position to do this in a large proportion of the applications actually made of the 
analysis of variance. Yet the analysis of variance in some form—possibly 
through the use of rank-order numbers or through a transformation or some other 
auxiliary device—remains the one powerful means of attacking a very large and 
important class of practical situations. The practicing statistician needs to do 
some highly educated guessing on such matters—guessing that will be assisted 
but not made determinate by knowledge of a considerable range of mathematical 
truths regarding approaches to the normal distribution, moments of the variance- 
ratio in samples from non-normal populations, asymptotic large-sample theory, 
and other such topics. His mathematical insight needs to be supplemented by 
consideration of the particular subject-matter of application. Moreover, it is 
desirable that students of statistics have some practice with actual empirical 
data designed to develop the art of guessing in such ways. 

Another example of non-rigorous mathematics used extensively in statistics is 
the whole business of asymptotic standard errors found by the differential 
method. It is desirable that good mathematics replace bad in such connexions, 
but something is to be said for the position into which so many practical statis- 
ticians have been driven, that even bad mathematics may be better than none 
at all. The requisite good mathematics along these lines can come only through 
those who have made really serious studies of statistics, though. a sufficiently 
interested pure mathematician might eventually be led by such a student of 
statistics to undertake and complete the necessary research. Practical needs 
make approximations necessary ; the goodness of a particular approximation can 








THE TEACHING OF STATISTICS 113. 


often be judged adequately by a statistician familiar with the particular applica- 
tion long before the heavy artillery of advanced mathematical analysis can be 
brought up. 

The teacher of statistics must have a genuine sympathy and understanding for 
applications, and these are not possessed by many pure mathematicians, at least 
in the opinion of some of those concerned with the applications; and it is this 
opinion rather than the possible fact that is of interest at the moment. For so: 
long as such an opinion is maintained, for example by psychologists and econ- 
omists, these specialists will be suspicious that courses in statistics given by a 
department consisting largely of pure mathematicians is unsuitable for their 
purposes. The result is likely to be a sabotaging of attempts at centralization, 
the different departments reverting to the old and ultimately objectionable 
system of teaching their own separate courses in statistical methods. 

These difficulties are not necessarily insuperable, and it is to be expected that 
many medium-sized and small institutions will make their mathematical depart- 
ments responsible for statistical teaching. But this ought not be be done without 
a consideration of the possible dangers. 

22. The statistical curriculum. We next consider curricular problems. These 
may be divided into those of the graduate school and those of the undergraduate 
college. ‘Those of the graduate school may in turn be divided into those of 
specialization in statistics and of auxiliary teaching of statistics to students in 
other departments, such as sociology, who need to use statistical methods, have 
not studied them sufficiently as undergraduates, and cannot afford to put much 
time on them. Of these two subdivisions the number of students at present is 
greater in the second and the ultimate importance is greater in the first, because 
the whole future of statistics depends on improvement and enlargement of this 
graduate teaching. 

The incidental teaching of elementary statistical methods to graduate students 
in other subjects, without any prerequisite in mathematics or statistics, cannot 
equip these students with a command of the subject at all comparable to that 
which could be obtained by a better integration of undergraduate with graduate 
work. A prospective sociologist, economist, psychologist, or physicist ought to 
study elementary statistical methods and concepts while still an undergraduate, 
and without special reference to his ultimate field of specialization. 

The features of statistical methods peculiar in their applications, beyond what 
is taught through illustrations and exercises in an elementary course, may be 
fit material for a course, graduate or undergraduate, in a department of the 
application. Such a course should require as a prerequisite an elementary course 
in a department of statistics, or at least one taught by specialists in statistical 
method and theory. 

For the undergraduate college, in place of the sporadic offerings now current in 
different departments, we recommend a combination of two general fundamental 
courses with a number of advanced courses. Of the latter some will be special- 
ized to the work of particular departments or groups of departments. 











114 THE TEACHING OF STATISTICS 


Of the two fundamental courses one will require calculus as a prerequisite, the 
other only a knowledge of first-year algebra. It is to be hoped that the less 
mathematical of these two general statistical courses, instead of being elected by 
a majority of students, will gradually approach extinction, while the course based 
on calculus will become the vital point of contact of the student body with the 
concepts of statistics. The chief reason for insisting upon the importance of 
calculus as a prerequisite is simply the possibility of covering important statistical 
theory that is inaccessible to those who do not have it. 

Modern statistical methods are based on the theory of probability. The 
general courses in statistics may therefore well begin with elementary probability. 
The duality between probability and statistical concepts,’ for example between 
probability and relative frequency, between mathematical expectation and a 
sample mean, between parameter and statistic, should be explained. Deriva- 
tions and the place of the normal distribution should be sketched, and the Student 
distribution should be derived and applied to a variety of problems in the first 
course based on calculus. Later courses given by the department of statistics, 
or whoever specializes in statistical theory, will naturally cover other statistical 
methods and theories. At the same time useful courses can be offered in eco- 
nomic statistics, mental testing, and other fields using statistical methods by 
specialists, regardless of departmental affiliation. There might be departmental 
cooperation ; for example, the department of statistics might offer elementary and 
advanced courses in correlation and multivariate analysis, and the department of 
psychology might require these as prerequisites for some of its work in mental 
testing. 

The teaching of statistics should be accompanied by considerable work in 
applied statistical problems, as well as exercises in mathematical theory, on the 
part of the students. A large part of this work in applied statistics is best con- 
ducted in a laboratory equipped with calculating machines, mathematical tables, 
drafting instruments, and other appurtenances. 

- Statistical laboratories require supervision, administration and maintenance. 
They are needed not only for the purpose of teaching statistics, pure and applied, 
at all levels, but also by research workers in many fields. There are possible 
gains of efficiency and economy in a centralized administration of them. One 
suggestion is that they be under the supervision of the university library. 
Another is that responsibility for them be lodged in a central department of 
statistics, or in a two-department statistical institute. Centralization can be 
carried too far, and it is likely that some units in a large organization will find 
it advantageous to have machines which are exclusively their own. The con- 
flicting claims regarding machines and laboratories will require careful weighing. 

23. Statistical method as a part of a liberal education. A question may also 
be raised as to whether some work in the statistical method should not be re- 
quired of all college students as a part of a liberal education. This would be 


7 Cf. the article ‘Frequency distribution,’’ Encycl. of the Social Sciences (1931). 





THE TEACHING OF STATISTICS 115 


a novel step, but bas much to be said for it in view of the widespread use of 
statistics and growing interest in statistics. Another point is that the student 
who can’t make up his mind as to his ultimate field of specialization or vocation 
will do well to study those things that can be used in many fields. Of such things, 
mathematics and statistics are leading examples. There are more or less sound 
objections to systems of required studies; but if we are to have them, the claim 
of statistics should not be rejected merely on grounds of novelty. 











ABSTRACTS OF PAPERS 
Presented December 22, 1947 at the Berkeley Meeting of the Institute 


1. The Performance Characteristic of Certain Methods for Obtaining Confidence 
Intervals. B. M. Bennerr and J. Nreyman, University of California, 
Berkeley. 


Certain methods for obtaining confidence a have been introduced by Bliss, R. A. 
Fisher and Paulson. Thus, e.g., let x; ,y;: (i = 1, » n) represent a sample from a bivari- 
ate norma! population with, means E(x) = &, Ely) = at and variances and covariance 
o. , oy Cy If 2,9,8,, S, , Szy are the sample means, variances and covariance respec- 
tively, then in order to determine confidence limits for a, the ratio: 


_ Vag = a8) 
VS? — 2a8,, + a? S? 
may be referred to the appropriate value ¢, of the Student-t distribution. The inequality: 
| uw | < t may, in general, be solved as a quadratic equation in a to yield two values a, & 
which are presumed to be confidence limits for a. In this paper the probability x of being 
correct in using such a procedure, i.e., the performance or operating characteristic, is com- 
puted i inthelimiting case wheno,, a, » Try = poz,cy are assumed to be known. Itis shownthat 
wis a function r(a,t,o2,¢y,p) of all the parameters, and in particular of a itself, the quantity 
for which confidence limits are supposed to be provided. Similar ‘‘quadratic’’ methods 
are also used in certain regression problems, e.g., in determining confidence limits for a 
value of z corresponding to an additional value of y when a previous sample regression of y 
on zis available; or in determining confidence limits for the intersection point of two popu- 
lation regression lines. The performance characteristic of each of these methods is shown 
to be a function of the quantity for which the method gives confidence limits. 


2. Some Further Results on the Bernoulli Process. T. E. Harris, Douglas 
Aircraft Co. 


Let 2: , 22 ,23 , *-* , be asequence of random variables defined as follows: P(z1 =r) = p,, 
r=0,1,2,---,k. Ife, = 0,204: = 0. Ife, = r, r +O, then 2,4: is distributed as the 
sum of r independent random variables, each having the same distribution as z,. It is 
assumed that z < 1, where z = E(z,). Let N be the smallest value of n such that 2,4: = 0. 
A method is given for obtaining an expansion of the moment-generating function of N. 
In the case where p, = 0 for r > 3, this expansion takes the form 1 + (1 — e~*) (1 — po) 
F(s), where F(s) = fi(s) — p2(1 — po)fa(s) = 2up3(1 — po)*fs(s) — --+ , where fi(s) = 
(e-* — x), and fa(s) = fn-i(s)(e7* — 2")-!. Certain restrictions on the constants p, 
insure that this expansion converges for a complex neighborhood of s = 0. 


3. Most Powerful Tests of Composite Hypotheses I. Normal Distributions. 
IX. L. Leamann and C. M. Srern, University of California, Berkeley, 
California. 


Critical regions are determined for testing a composite hypothesis, which are most power- 
ful against a particular alternative among all critical regions whose probabilities under the 
hypothesis tested are bounded above by the level of significance. These problems have 
been considered by Neyman, Pearson and others, subject to the condition that the critical 
region besimilar. In testing the hypothesis specifying the value of the variance of a normal 


116 





as 


ABSTRACTS OF PAPERS 117 


distribution with unknown mean against an alternative with larger variance, and in some 
other problems, the best similar region is also most powerful in the sense of this paper. 
However, in the analogous problem when the variance under the alternative hypothesis 
is less than that under the hypothesis tested, in the case of Student’s hypothesis when the 
level of significance is less than 3, and in some other cases, the best similar region is not 
most powerful in the sense of this paper. There exist most powerful tests which are quite 
good against certain alternatives in some cases where no proper similar region exists. 
These results indicate that in some practical cases the standard test is not best if the class 
of alternatives is sufficiently restricted. 


4. On the Selection of Forecasting Formulas. Paut G. Hor., University of 
California, Los Angeles, California. 


Given two competing formulas, uv = g(z: , +++ , 2m) and v = h(z; , --+ , 2m), for forecast- 
ing a variable z, a significance test possessing optimum properties is designed for deciding 
whether one formula yields significantly better forecasts than the other. The test, which 
turns out to be a Student ¢ test, is constructed as a test of the hypothesis Ho : m; = u; against 
the alternative H: : mj; = v; , (¢ = 1,+-- , n), in which it is assumed that the variables 
%,°** »%n , corresponding to the n samples, are independently normally distributed with 
means m; and variances o, = o? . 


5. On the Power Function of the “Best” i-test Solution of the Behrens-Fisher 
Problem. J. FE. Wausu, Douglas Aircraft Company 


The most powerful t-test solution of the Behrens-Fisher problem (one-sided and sym- 
metrical) was obtained by Scheffé in Annals of Mathematical Statistics, Vol. 14 (1943), pp- 
35-44. This note derives (approximately) the power efficiency of this é-test for the case 
in which the ratio of the variances of the normal populations is also known. Let the t-test 
be based on m sample values from the first normal population and n sample values from the 
second normal population, where m < n. For fixed values of m and n, a symmetrical 
t-test with significance level 2a has the same power efficiency as a one-sided t-test with 
significance level a. For one-sided i-tests with significance level a, the power efficiency 
is approximately 50[B + ~/B? — 8(m + n)Al/(m +n), where B = 2+ (m+n)A + Ka/2, 
Awl—K. /2(m — 1), and Ke is the standardized normal deviate exceeded with probability 
a. This approximation is reasonably accurate for m > 4if a = 05,m > 5ifa = 025, 
m>6ifa = 01,m>7if a= .005. Intuitively the power efficiency of a test measures 
the percentage of available information per observation which is utilized by that test. 





6. On Sequences of Experiments. CHARLES STEIN, University of California, 
Berkeley, California. 


One performs a sequence of N experiments to decide between two simple hypotheses 
regarding probability distributions of certain observable quantities. At each stage there 
is a choice among L experiments and the one chosen yields a random variable. One wishes 
to achieve certain upper bounds a and 8 to the probabilities of first and second kind errors 
respectively, and, subject to these restrictions, to minimize the expected cost under a third 
hypothesis. The cost of each particular sequence of experiments is known. A solution 
is obtained, essentially by applying Lagrange’s method and working back from the end 
of the experiment. This can be generalized to multiple decision problems. The results 
are applied to two-sample tests with the second sample of variable size, and to Wald’s 
sequential analysis. As another problem, suppose (X1, Yi), (X2, Y2) «+: are independ- 
ently distributed with bivariate normal distributions having mean ¢ and covariance matrix 
=, both unknown. One tests Ho : — = 0 against H; : t’>-1 = 5. A test (not necessarily 
optimum) valid within the usual approximation is obtained from the ratio of the p.d.f. 











118 ABSTRACTS OF PAPERS 


of Hotelling’s T? under H, to that under Ho. Analogous results hold for the multiple 
correlation coefficient, ratio of two variances and test for linear hypothesis. 


7. The Effect of Selection Above Definite Lower Limits of Linear Functions of 
Normally Distributed Correlated Variables on the Means and Variances of 
Other Linear Functions. G. A. Baxmr, University of California, Davis, 
California. 


Sometimes certain variables in a system can be observed before other economically or 
socially important variables. These variables or linear combinations of them can be used 
as a basis of selection at given levels. The question is: How does selection on these earlier 
or more easily available variables affect the mean and variance of the economically or so- 
cially more important variables or, perhaps, linear functions of the more important vari- 
ables. The general procedure is clear. We transform to a new system of variables which 
contains the linear functions on which selection is performed and the linear functions of 
which the means and variances are required as separate variables. The remaining new 
variables are eliminated by integration. The final calculation involves the numerical 
evaluation of integrals whose integrands are the product of polynomials and normal multi- 
variate functions and whose limits depend on the given levels of selections. The general 
ideas are simple but the actual labor of computation in a given case is tedious. An example 
is considered in detail. 


8. An Inversion Formula for the Distribution of a Ratio of Random Variables. 
J. GuRLAND, University of California, Berkeley, Calif. 


The repeated Cauchy principal value of integrals applied to characteristic functions is 
used in obtaining inversion formulae for distribution functions. Let the random variables 
X, and X: have a joint distribution function with corresponding characteristic function 

—6¢ 
o(t: , 2). Suppose P{X2 < 0} = 0. tet f 0 dt = lim ( + | aw dt for any 
e—0 — T . 
function g(t). If G(x) is the distribution function of X:/X; then G(z) + G(z — 0) = 
1 t Lo ts 
,-1 fob—%) | 
wt t 
given by Cramer in the case where X; and X: are independent; and differentiation extends 
a result of Geary to a much larger class of distribution functions. Further generalizations 
of the theory are obtained, and as an example the distribution function of the ratio of quad- 
ratic forms of random variables X; , Xz --+ X, is considered in the case where X; , X2 --+ Xn 
have a multivariate normal distribution. 


t. This formula is free of restrictions which accompany the formula 


9. Independence of Parameters and Sufficient Statistics. EK. W. BarankIn, 
University of California, Berkeley, California. 


The notions of complete set of independent parameters and minimal set of sufficient statistics 
are suitably defined for a class of families of probability densities {p(z,--- , 2; 
31, -°:* , 8)}, and the order of each of these sets is determined as the rank of a certain 
matrix. Second order continuous differentiability is eventually required of the function p; 
and certain other conditions are laid down, designed to ensure that the behavior of p:p in 
the large is similar to its behavior in the small when only continuous differentiability is 
assumed. The problem of determining the order of a minimal set of sufficient statistics 
is made, by certain device, to become identical in character with that of finding the order 
of a complete set of independent parameters. (This is in the nature of these concepts.) 


unseen 


tiple 


iS of 
S of 
Vis, 


ly or 


rlier 
r so- 
vari- 
hich 
18 of 
new 
rical 
ulti- 
eral 
mple 


les. 


ns is 
bles 
tion 


any 


) 


nula 


ends 
‘ions 
uad- 
x 


KIN, 


slics 
i, 
tain 
DD; 
p in 
y is 
stics 
rder 
ts.) 


ABSTRACTS OF PAPERS 119 


An explicit method is given for finding a complete set of independent parameters and a 
minimal set of sufficient statistics. 


(a eR an 


(Presented December 30, 1947 at New York at the Annual Meeting of the Institute) 


1. Distribution of the Circular Serial Correlation Coefficient for Residuals from 
a Fitted Fourier Series (Preliminary Report). R. L. ANDERSON, University 
of North Carolina, Raleigh, North Carolina and T. W. ANDERsoN, Columbia 
University. 


Given a set of N observations {X;}, which are defined as follows: 


Xim—mumeHeo(Xi—-L—-w-L)+a4, 
where the residuals {«;} are assumed to be normally and independently distributed with 
zero means and equal variances and L is the lag. A statistic for testing the null hypoth- 
esis: p = 0 is ,R, the circular serial correlation coefficient of residuals e; from a regression 
line fitted by least squares: X; = M;+ e;. The following regression-line is considered : 


2rki 2rki 
M; = a + >.’ ax Cos ee - +>" i» te —— , 
- yn °4 N 


where k ranges over some subset of the integers 1, 2,--- , 3(N — 1) or 3(N), depending 
on whether N is odd or even (if N is even, by is not used). Hence ,R is defined as: 


€1 Cr41 + C2@r42 + °° + CneL4Nn 


De? , 


LR= 


with City ™* C5 

The distribution of this , R has the same general form as that presented by R. L. Anderson 
for p = 0 [‘‘Distribution of the serial correlation coefficient,’’ Annals of Math. Statistics 
13:1-13(1942)]; and for p ~ 0 by W. G. Madow [‘‘Note on the distribution of the serial 
correlation coefficient,’’ Annals of Math. Statistics 16 :308-310(1945)]. 

For M; consisting of terms of only one period, . = 2,3, 4, 6,12 and 24, exact values 
of the 1% and 5% significance levels of 1: have been computed for N = 12 and 24. Ap- 
proximate significance levels have been computed for N = 12(12)96. More of the exact 
significance levels are being computed, and all computations will be extended to include 
some multiple periods and some lags greater than 1. 


2. Some New Methods for Distributions of Quadratic Forms. Haroip 
Hore LING, Institute of Statistics, University of North Carolina, Chapel Hill. 


Any homogeneous quadratic form in normally distributed variates of zero means has 
the same distribution as g = Bayz ees + n>) , where the a; are roots of a determinantal 
equation based on the coefficients of the given form and the parameters of the normal 
distribution, and where the z; are normally and independently distributed with zero means 
and unit variances. We take 2a; = n, and begin by expanding the distribution of a positive 
definite form in a series of powers of g whose coefficients are polynomials in the reciprocals 
of the a;. This series shows the analyticity of the function, which is then expressed as 
the product of a X? distribution function of a series of Laguerre polynomials with coefficients 
which are simple polynomials in the moments of thea; . Indefinite forms and certain ratios 
of forms are dealt with by convolutions of these series and by other means. 











120 ABSTRACTS OF PAPERS 


3. Frequency Functions Defined by the Pearson Difference Equation. Lxro 
Katz, Michigan State College, Fast Lansing, Michigan. 


Frequency ‘‘links’’ formed from the Pearson difference equation provide an efficient 
means of fitting functions to observed distributions. These links, involving three constants 
which are determined by the first four moments of the observed series, correspond to a 
three-parameter family of discrete frequency functions. This family of functions is just 
as broad as that defined by the differential equation, containing functions of equally diverse 
types; in addition, it has the very important advantage that the graduation process is the 
same for any type. Further, the simpler functions of the family all correspond to points 
lying in one plane of the parameter space. This plane, giving a two-parameter family 
of functions (depending upon the first three moments), is studied intensively, rather com- 
plete results being obtainable for areas, moments, sampling characteristics of moments, 
etc. It is also shown that the problem of discrimination among simple discrete frequency 
functions for graduating observed data is resolvable (in the plane) to the sampling distri- 
bution of one statistic. A special case of the two-parameter family depending on only the 
first two moments was previously discussed. 


4. Distribution of the Sum of Roots of a Determinantal Equation under a 
Certain Condition. D.N. Nanna, Institute of Statistics, University of North 
Carolina, Chapel Hill. 


Let x = || 2; || and z* = || i, || be two p-variate sample matrices with n; and nz degrees 


of freedom. Then S = rz’/n, and S* = x*z*’/n2 are, under the null hypothesis, independ- 
ent estimates of the same population covariance matrix. The distribution of a root, speci- 
fied by its rank order, of the determinantal equation | A — 6(A + B) | =0, where A = 1S 
and B = n.S*, has already been given by S. N. Roy, and by the author, who has also ob- 
tained the limiting distribution of any root when one of the samples becomes infinitely 
large. The moment generating function of the sum of the roots when nm; = p + 1 can be 
derived from the limiting distribution of the largest root. The probability distributions 
of the sum of roots under this condition have been formulated for the determinantal equa- 
tions having two, three, and four roots. The moments of these distributions have also 
been obtained. The method is applicable for the determinantal equation of any order. 
These probability distributions can easily be tabulated, as they involve only simple al- 
gebraic and incomplete beta functions. 


5. Applications of Carnap’s Probability Theory to Statistical Inference. 
GERHARD TINTNER, Iowa State College, Ames, Iowa. 


The new theory of probability of Rudolf Carnap (‘‘On inductive logic,’’ Philosophy of 
Science, vol. 12, 1945, pp. 72 ff. ‘‘The two concepts of probability,’’ Philosophy and Phe- 
nomenological Research, vol. 5, 1944, pp. 513 ff.) introduces a distinction between probabil- 
ity: , the degree of confirmation, and probability: , related to relative frequency. It is 
believed, that the ideas developed are useful in clarifying the problems of statistical in- 
ference. 

As an example, consider the case of ‘‘inverse inference,’’ i.e. inference from a sample to 
the population. The evidence is that in a sample of size s there are s; individuals with 
a certain property M and sz = s — s, without the property. The hypothesis is that in the 
population consisting of n individuals there are m; individuals with property M and nz = 
n — nm, individuals without this property. The degree of confirmation is then: 


m+wm—l 2 
voce in + w.—-1 
n+k—2 
Ease) 


LEO 


lent 
ants 
toa 
just 
erse 
the 
ints 
nily 
om- 
nts 
mney 
stri- 
the 


ra 
rth 


rees 
nd- 
eci- 
mS 
ob- 
ely 
be 
ons 
ua- 
so 
ler. 

al- 


ce. 


he- 
oil - 
, is 
in- 


ith 
she 


ABSTRACTS OF PAPERS 121 


In this formula we have: w; the logical width of the property M, w2 the logical width of 
the property non-M,k = w+ w:. It should be noted that for w; = wz = 1 the formula 
becomes the classical result, i.e. a term of the hypergeometric distribution. 

This idea may be applied to statistical estimation. We could for instance choose n; 
in such a fashion that c* becomes a maximum. This would be estimation by the principle 
of maximum degree of confirmation, analogous to maximum likelihood. Inasimilar fashion 
we may also use c* to establish limits for n; similar to confidence or fiducial intervals. 


6. Circular Probable Error of an Elliptical Gaussian Distribution. Havuietr H. 
GERMOND, 8. W. Marshall & Co., Consulting Engineers, Washington, D. C. 


Preliminary tables are presented, giving the radii of distribution-centered circular 
cylinders enclosing various percentages of the volume under an elliptical bivariate Gaussian 
surface. These tables are further interpreted in terms of a correlated bivariate Gaussian 
distribution. The application of these tables to impact analysis is illustrated. 


(ee ie cs RR a 
(Presented December 29, 1947 at the Chicago Meeting of the Institute) 


1. The Asymptotic Analogue of the Theorem of Cramér and Rao. Herrman 
Rvsin, Institute for Advanced Study, Princeton, N. J. 


The author generalizes the results of Cramér and Rao on the minimum variance of es- 
timates to the case of the asymptotic distribution of an estimate. He shows that if certain 
regularity conditions are satisfied, the formula given by Cramér and Rao remains valid. 
The main results are obtained in the case of consistent estimates, but with a stronger set 
of hypotheses, the results remain true for estimates which are not consistent. The method 
used to obtain these results is to construct statistics to which the theorem of Cramér and 
Rao can be applied, and whose variance converges to the variance of the limiting distribu- 
tion. This procedure is also applied to the case in which there is no limiting distribution, 
and in which two sequences of distributions are considered which act as if they approach 
each other. 











BOOK REVIEWS 


Sequential Analysis Abraham Wald. John Wiley and Sons, Inc. pp. vi, 212, 
$4.00. 


REVIEWED BY M. A. GIRSHICK 


Douglas Aircraft Company 


The development of sequential analysis as a new tool of statistics is by and 
large the work of Abraham Wald. This fact in itself would make the appear- 
ance of a book by him on this subject an important event. However, Wald in 
this book did more than discuss the present status of sequential theory. He 
has, in fact, written a very lucid treatise on the general subject of statistical 
inference—a treatise which is likely to have great influence on statistical think- 
ing. 

While this book is not written for the mathematically untrained, a knowledge 
of differential and integral calculus will suffice to follow all the arguments ex- 
cept perhaps for some sections in the appendix where the more complicated 
proofs have been placed. 

The main body of this book is divided into 3 parts and 11 chapters. Part I, 
covering chapters 1 to 4 inclusive, deals with the general theory of the sequential 
probability ratio test. Chapter 1 introduces in an elementary fashion the no- 
tion of probability distributions, tests of hypotheses and the Neyman-Pearson 
theory of two-valued decisions based on a fixed sample size. In Chapter 2, 
the general notion of a sequential test procedure is introduced and the operating 
characteristics of such tests are discussed. Chapter 3 deals with the sequential 
probability ratio test for testing a single hypothesis against a single alternative. 
Here the boundaries of this sequential criterion are expressed in terms of the 
risks, the operating characteristic and the average sample number functions 
are developed and bounds are obtained for the errors arising from truncation 
and neglect of excess over the boundaries. Chapter + presents a sequential 
theory for testing simple and composite hypotheses against a set of alternatives. 
The fundamental idea introduced is the concept of a weight function in the 
parameter space which permits handling composite hypotheses, or simple hypo- 
theses with many alternatives, by means of the sequential probability ratio 
test. 

Part II of this book, consisting of chapters 5 to 9 inclusive, deals with the 
applications of sequential analysis to special problems. Chapter 5 contains a 
discussion of the binomial case with specific reference to lot-by-lot acceptance 
inspection. Of special interest in this chapter is the derivation of the exact 
characteristic function for a large class of tests and the development of upper 
and lower limits for the effect of grouping on the OC and ASN curves. Chap- 
ter 6 deals with the problem of double dichotomies. A procedure for testing 
the difference between the parameters of two binomial distributions is developed 

122 





r- 
in 
le 
al 
k- 


he 


ce 
ct 
er 


ng 
ed 


BOOK REVIEWS 123 


for the fixed size as well as the sequential procedure. Chapters 7, 8, and 9 are 
concerned with the application of sequential analysis to the normal distribution. 
In these chapters the sequential probability ratio test is applied to hypotheses 
concerning the mean of a normal distribution when the variance is known, when 
the variance is not known (non-central t case) and hypotheses concerning the 
variance when the mean is known and when the mean is not known. 

Part III consists of two short chapters and deals with multi-valued decisions 
and sequential interval estimation. ‘The results in these chapters are not de- 
finitive answers to the two outstanding problems in statistical inference but are 
merely suggestive of a possible approach to them. Nevertheless, from the 
point of view of stimulating future research these 2 chapters are perhaps the 
most valuable sections of this book. The reader, having been exposed in the 
previous chapters to various tests the outcome of which is a two-valued decision, 
is naturally led in Chapter 10 to the consideration of tests the outcome of which 
is a multi-valued decision. The notion of a risk function, introduced elsewhere 
by the author in the non-sequential case, is again used as the main tool in handling 
multi-valued decisions sequentially. In Chapter 11 the important problem of 
setting up confidence intervals of fixed length by means of a sequential proce- 
dure is discussed and a possible method for accomplishing this is indicated. 

As was previously noted, the main theorems on sequential analysis are con- 
tained in the Appendix and since they have all been previously published in the 
Annals they will not be mentioned in the present review. The Appendix, to- 
gether with the main body of the book form a fairly exhaustive treatment of 
sequential theory. A notable exception to this is the lack of any mention of the 
published research on sequential point estimation. This is probably accounted 
for by the fact that this research came too late to be included in the book. Other 
minor omissions that may be noted are references to the generalization of the 
Fundamental Identity to more than one dimension and other theorems on 
sequences of functions of random vectors which have appeared in print. Also 
no mention is made of the similarity of sequential analysis to the problems of 
the random walk and the gambler’s ruin. This, in the opinion of the reviewer, 
is regrettable. 

This book will make a very suitable companion to the book Sequential Analy- 
sis of Statistical Data: Applications prepared by the Statistical Research Group, 
Columbia University (see review by J. W. Tukey, Ann. of Math. Stat. Vol. 
xviii, 1947). While there is some overlap in the material covered, the two books 
differ in emphasis. Wald’s book, though not highly technical, is more in the 
nature of a textbook on the theory and application of sequential analysis. The 
SRG book on the other hand, was prepared mainly for statisticians who may 
wish to use sequential analysis in practice. The latter book is therefore more 
detailed and puts less emphasis on the theoretical aspects of the sequential 
procedure. 

The book is surprisingly free of typographical errors which is a tribute to the 
high quality of the editorship. 











124 BOOK REVIEWS 


Statistical Methods. George W. Snedecor. Ames, Iowa: The Iowa State 
College Press, Inc., 1946; pp. xvi, 485. $4.50. 


REVIEWED BY FREDERICK MOSTELLER 
Harvard University 


Statistical Methods is a non-mathematical treatment of modern experimental 
statistics. Few non-mathematical books are available that treat such topics 
as confidence limits, use of transformations, and analysis of variance and covari- 
ance in the detail presented by Snedecor. The examples are largely, but not 
entirely, drawn from agriculture and animal husbandry. The exercises for 
students are extensive and thought-provoking. 

Unlike most non-mathematical texts the book under review does not spend 
pages and pages on methods of recording frequencies and methods of computing 
countless moments which are seldom used in the later developments of the text. 
There is no long exasperating discussion of kurtosis and skewness; and there is 
no parade of qualitative Greek names for categorizing frequency distributions. 

The reviewer has used this book for teaching a second course in statistics to 
social science majors with reasonable success. ‘The main disadvantage was the 
biological nature of most of the examples, but until some author writes a com- 
parable book using social science examples, the reviewer will continue to use 
Snedecor’s material for a large part of the course. 

The main differences between the Third and Fourth Editions of this text have 
been adequately summarized by Snedecor: 

“‘(i) greater emphasis has been placed on the theoretical conditions in which 
the various statistical methods have validity, and concurrently (ii) on the conduct 
of the experiment so as to incorporate in the data the information desired; (iii) 
estimates and fiducial statements have been brought into equal prominence 
with tests of hypotheses; (iv) there is increased reliance on experimental sam- 
plings to exemplify distribution theory; (v) the treatment of correlation and of 
experimental designs has been expanded; and (vi) the methods for dispropor- 
tionate subclass numbers have been extended to include all those necessary for 
ordinary needs.’”’ Some more obvious changes in the Fourth Edition are the 
entirely new type and summaries which are included at the end of some of the 
chapters. The practice of using random sampling numbers (iv) to help explain 
theory has long been employed by teachers of statistics, but few authors have 
taken as much advantage of this technique as has Snedecor. In the Fourth 
Edition confidence intervals are widely used (iii). The author uses the adjec- 
tives “confidence” and “fiducial”? more or less interchangeably, but it is the 
reviewer’s opinion that it is the Neyman concept rather than the Fisherian that 
predominates. It should be remarked that this is one of the few texts that 
give the students the idea that in linear regression we do not predict y with the 
same accuracy for every x even when linearity and homoscedasticity hold (v). 

The main emphasis of the book is on the analysis of variance. The author 
succeeds extremely well in showing the student how to carry out the analysis 


ate 


ital 
es 


not 
for 


ond 
ing 
xt. 
2 is 
ns. 
; to 
the 
ym- 
use 


ich 
uct 
iii) 
nce 
m- 
of 
or- 
for 
the 
the 
ain 
ive 
rth 
ec- 
the 
rat 
iat 
the 
v). 
10F 


‘SIS 


BOOK REVIEWS 125 


even at rather complex levels. On some other points he was not quite so suc- 
cessful. For example, the reviewer feels that the meaning of “interaction” was 
never gotten across, and that for the student the higher order interactions are 
still just things to be computed. Furthermore in attempting to make sure 
that the student understands how to do the computation the author often does 
not encourage the student to take any overall view of the data before blindly 
starting to compute. In addition, reasons for doing the experiment are some- 
times vague and the conclusions are often couched only in the jargon of analysis 
of variance. Therefore, the student seldom gets an opportunity to find out 
what kinds of recommendations might reasonably be made as the result of an 
experiment. Perhaps the worst example is on pages 275-280. Here the 
experiment deals with yield of wheat in 48 pots, with two series of soil treatments, 
humus and chemical. Anyone glancing over the results of the experiment will 
be startled to find that every yield from pots with “no humus treatment” (12 
observations) is greater than any yield with “humus treatment” (36 obser- 
vations). The reader will be further startled to find that all the evidence tends 
to support the notion that ‘‘no chemical treatment” is at least as fruitful as any 
of the chemical treatments tried. However, Snedecor says ‘‘The striking feature 
of this experiment is the discrepance among the subclasses. The chemicals 
applied to one humus treatment produced yields out of accord with those from 
other humus treatments.”’ Snedecor then pushes on to a more subtle analysis. 
The reviewer feels that here as elsewhere in the book the author occasionally 
forgets that the extended analysis looks rather ridiculous unless the practicality 
of applying the technique is discussed. The example considered here is one in 
which the point could profitably be made that everyone can see from a visual 
examination of the data what the results of the experiment show. The analysis 
backs up the student’s common sense appraisal of the situation and gives him 
more confidence in and understanding of the method when it is applied in more 
delicate situations. It seems to the reviewer that too many times the appli- 
cation of the analysis of variance obfuscates the main point of the experiment. 
In the haste to get to the computations and the comparisons of interactions and 
errors the author frequently neglects to impress the student with the funda- 
mental differences between means and their ultimate interpretation. However, 
the author does bring out clearly the notion of the various estimates of variance, 
a subject frequently neglected. 

In the next to last chapter the binomial and Poisson distributions are discussed. 
In this connection the inverse sine and the square root transformations are 
treated briefly, as is the logarithmic transformation. It is surprising that no 
indication is given of the theoretical variances when the inverse sine and square 
root transformations are used. The theoretical discussion of the transformation 
is limited to the remark that these transformations tend to make the variance 
independent of the means, but there is no indication of the further advantages. 
This is surprising because in a much earlier chapter the use of Fisher’s trans- 
formation for correlation coefficients was treated quite adequately. It seems 








126 BOOK REVIEWS 


to the reviewer that in a later edition the use of transformation might well be 
moved forward in the book, and that the theoretical and practical implications 
might be treated more thoroughly. 

As in most other texts the final chapter “Design and Analysis of Samplings” 
needs very considerable expansion. 

The book begins (Chapter 1) with a consideration of the sampling of attributes, 
inferences that can be drawn about the population, confidence limits, use of 
chi-square in a 1 x 2 table, and some discussion of the use of ratios, rates, and 
percentages. Measurement data is then (Chapter 2) discussed including the 
computation and application of the mean, range, standard deviation, probable 
deviation, median, and quartiles. The concepts of null hypothesis and confi- 
dence limits are introduced in Chapter 2 and elaborated in Chapter 3 which 
concerns sampling from a normally distributed population, random samples, 
distribution of the mean, variance, standard deviation, and of ¢. The com- 
parison of two groups in contrast to individuals is treated in Chapter 4 including 
groups with different numbers of individuals. Chapter 5 provides material on 
short cut methods of computation using calculating machines, code numbers 
are explained, suggestions about significant numbers and rates and percentages 
are given, and the use of the ratio range/sigma is introduced. 

After considering linear regression and correlation (Chapters 6, 7) the author 
relates the two notions, and then goes on to consider some interesting special 
cases of correlation. Chapter 8 deals with large sample methods. Chapter 9 
concerns enumeration data with more than one degree of freedom, discusses 
adjustments of chi-square and its computation with large numbers of degrees of 
freedom, and describes the analysis of 2x 2x 2, R x 2, and RxC tables. The 
computation of the analysis of variance for two or more groups of measurement 
data and with two or more criteria of classification: variance ratio F, use of 
Latin square, analysis with disproportionate subclass numbers, and the use of 
randomized blocks are considered in Chapter 10 and 11, while analysis of co- 
variance is treated in Chapter 12 (22 pages). Multiple regression including 
partial and multiple correlation coefficients, tests of significance and confidence 
limits are handled in Chapter 13 and curvilinear regression considered in Chapter 
15. Chapter 16 deals with binomial and Poisson data, and Chapter 17 discusses 
the design and analysis of sampling, including sampling from a homogeneous 
or small population and the effectiveness of stratification. 

It seems to the reviewer that at the present time one would be hard put to 
find a better statistics text written at this level. 








lhe 
ent 
» of 
> of 


ling 
nee 
oter 
sses 
-OUS 


t to 


NEWS AND NOTICES 
Readers are invited to submit to the Secretary of the Institute news items of interest 


Personal Items 


Dr. Franz L. Alt, who has been with the Econometric Institute, New York, 
as Assistant Director of Research, is now Deputy Chief of the Computing Labo- 
ratory at the Ballistic Research Laboratories, Aberdeen Proving Ground, Aber- 
deen, Maryland. 

Mr. A. George Carlton has accepted a position as Assistant Professor of 
Mathematics at the University of Illinois. 

Assistant Professor Paul R. Halmos, University of Chicago, Chicago, Illinois 
is on leave for the academic year. He is spending the year at the Institute for 
Advanced Study, Princeton, New Jersey on a Guggenheim Fellowship and 
will return to the University of Chicago in September, 1948. 

Mr. Henry F. Hebley of the Pittsburgh Coal Co. spent most of last summer 
in Eastern Europe carrying out a survey on coal production and fuel availa- 
bility in Poland. This work was carried out in the interest of the International 
Bank for Reconstruction and Development. 

Dr. Harold D. Larsen, former Associate Professor at the University of New 
Mexico, has joined the faculty of Albion College, Albion, Michigan. 

Mr. Dickson H. Leavens has resigned as Research Associate of the Cowles 
Commission for Research in Economics. He will continue as Managing Editor 
of Econometrica and may be addressed at 1632 Wood Avenue, Colorado Springs, 
Colorado. 

Professor S. B. Littauer, who has been Chairman of the Mathematics Depart- 
ment, Newark College of Engineering, Newark, New Jersey, has now accepted 
an associate professorship in the Department of Industrial Engineering, Columbia 
University. 

Professor Harris F. MacNeish, who has been Chairman of the Department of 
Mathematics at Brooklyn College since its foundation in 1930, has resigned 
to accept a visiting professorship in Mathematics at the University of Miami, 
Coral Gables, Florida. 

Mr. Clifford J. Maloney has resigned a position as Research Associate in the 
Statistical Laboratory of Iowa State College to serve as Chief, Statistics Branch, 
Camp Detrick, Frederick, Maryland, an agency of the Chemical Corps of the 
United States Army. 

Mr. Monroe L. Norden, who has formerly been with the Ballistic Research 
Laboratories, Aberdeen Proving Ground, Maryland, has accepted a research 
position in theoretical or mathematical statistics at the Douglas Aircraft Co., 
Santa Monica, California. 

Mr. W. E. Pattee has resigned his position as statistical engineer with the 
Canadian Industries Limited, Skawinigan Falls, Quebec and has accepted a 
position as senior chemist, Ottawa Mill, E. B. Eddy Company, Hull, Quebec. 

127 











128 NEWS AND NOTICES 


Mr. Robert I. Piper, who was formerly plant staff assistant at the Southern 
California Telephone Company of Los Angeles, has been transferred to the 
systems office of the Pacific Telephone and Telegraph Company. He will assist 
in planning and analysing sampling surveys of the wages rates prevailing in 
the Pacific coast states in which the company operates. 

Mr. Herbert Solomon, who was formerly an instructor at the College of the 
City of New York, has accepted an assistant professorship in the Mathematics 
Department, Newark College of Engineering, Newark 2, New Jersey. 

Dr. A. G. Swanson, formerly an assistant chairman of the Department of 
Mathematics and Mechanics at the General Motors Institute, Flint, Michigan, 
has accepted an associate professorship in the Department of Mathematics, 
Gustavus Adolphus College, St. Peter, Minnesota. 


A nr I 


A federal center of applied mathematics—the National Applied Mathematics 
Laboratories—has been established as a division of the National Bureau of 
Standards. The new organization is oriented around modern mathematical 
statistics as applied to the physical and engineering sciences and to the develop- 
ment and use of modern high speed computing. The applied mathematics 
laboratories include four separate laboratories: the Institute of Numerical 
Analysis; the Computation Laboratory; the Statistical Engineering Laboratory; 
and the Machine Development Laboratory. 

Two members of the Institute have been given important positions in this 
organization. Dr. John Curtiss, who has been Director’s Assistant in Applied 
Mathematics at the Bureau of Standards, has been named Chief of the National 
Applied Mathematics Laboratories. Dr. Churchill Eisenhart has been ap- 
pointed head of the Statistical Engineering Laboratory. 


nr II a 


Statistical Summer Sessions at the University of California, Berkeley 


Following the encouraging experience of last year the University of California 
offers statistical programs in the two Summer Sessions of 1948. The teaching 
staff is as follows: 

Ray CHANDRA Bose, Professor of the University of Calcutta, India. 

Miss Evetyn Frx, Lecturer at the University of California, Berkeley. 

Erich L. LEHMANN, Assistant Professor of the University of California, 

Berkeley. 

Micue. Lokve, Reader at the University of London, England. 

Jerzy NrEYMAN, Professor of the University of California, Berkeley. 

ABRAHAM WALD, Professor of Columbia University, New York. 

Courses in statistics are offered on both the graduate and the undergraduate 
levels. The graduate courses, all given during the First Summer Session, June 21 








tics 
1 of 
ical 
lop- 
ties 
‘ical 
ory; 


this 
lied 


onal 


eley 


mia 
hing 


rnia, 


uate 
1e 21 





NEWS AND NOTICES 129 


to July 31, are meant primarily for students who either have already obtained 
their Ph.D. degree or are working towards it. Therefore, apart from formal 
classes, it is proposed to hold extensive seminars in which the work of students 
will be discussed. No specific prerequisites to graduate courses will be required. 
However, to benefit from the courses, the students must be generally familiar 
with the theory of statistics. In addition, course 272 and especially 271 will 
require a reasonable knowledge of the theory of functions. 

There will be two undergraduate courses offered, course $12 during the First 
Summer Session, June 21 to July 31, and course $113 during the Second Summer 
Session, August 2 to September 11. Both of these courses were recently in- 
troduced into the curriculum and are prerequisites to more advanced courses 
in statistics. They are offered during the Summer Sessions for the benefit of 
students, otherwise advanced, who plan to attend more advanced courses in 
statistics during the fall semester. Besides, course S12 is recommended for 
students who do not intend to specialize in statistics but wish to acquire some 
knowledge of this subject as a part of their general education. 

The Statistical Laboratory will be available for students doing research. 

First Summer Session 


S12. Elements of Probability and Statistics. Mr. LEHMANN 
271. Random Functions. Mr. Lo&vE 
272. Sequential Analysis. Mr. Wap 
273. Design of Experiments. Mr. Bose 
$290s. Seminar in Theory of Statistics. Mr. Lo&ve, Mr. Wap 
290t. Seminar in Design of Experiments. Mr. Bose 
$295. Individual Research. Mr. Bose, Mr. Lo&ve, 


Mr. Neyman, Mr. Wap 
Second Summer Session 
$113. Second Course in Probability and Statistics. Miss Frx. 


en RR 


Statistical Sessions at Alabama Polytechnic Institute 


Professor George W. Snedecor, President of the American Statistical Associa- 
tion and Research Professor of Statistics at Iowa State College, will be Visiting 
Research Professor of Statistics at Alabama Polytechnic Institute during the 
Spring Quarter, from March 22 to June 4, 1948. Professor Snedecor will 
lecture on Statistical Experimental Design and will be available for statistical 
consultations. 

The newly formed Stastistical Laboratory at A.P.I. will also offer a course 
in Survey Sampling during the Spring Quarter to be taught by the Director, 
Professor T. A. Bancroft. Conferences in applied statistics for research workers 
in the lower southeastern states are being scheduled during the time of Pro- 
fessor Snedecor’s visit. 











130 NEWS AND NOTICES 


New Members 
The following persons have been elected to membership in the Institute 
(September 1 to November 30, 1947) 


Afzal, M., M.A. (Panjab, India) Graduate student at Columbia Univ., 1088 John Jay Hall, 
Columbia University, New York 27, New York. 

Billeter, Ernest P., Ph.D. (Univ. of Basle) Scientific Assistant (Statistical Office, Zurich) 
Turnerstrasse 23, Basle, Switzerland. 

Bishop, David James, M.Sc. (London) Head of Operational Research Section of British 
Iron and Steel Research Association, 11 Park Lane, London W. 1., England. 

Brooks, Hamilton, B.See (Univ. of Pittsburgh) Design Engineer, Westinghouse Electric 
Corp., P.O. Box 983 E. Pittsburgh, Pennsylvania. 

Craw, Alexander R., M.S. (Univ. of Notre Dame) Instructor in Math., U. S. Naval 
Academy, Annapolis, Maryland. 

Edwards, Daisy M., A.M. (Columbia Univ.) Lecturer in Statistics, University of 
London, Institute of Education, 1, Oakfield Court, Queens Road, Weybridge, Surrey, 
England. 

Havermark, K. Gunnar, Chief of Division, Royal Social Board, Lagerlofsg 8, Stockholm, 
Sweden. 

Hollingsworth, Charles A., Ph.D., (State Univ. of Iowa) Research Chemist, 504 Maple 
Ave., Waynesboro, Virginia. 

Hurd, Cuthbert C., Ph.D. (Univ. of Ill.) Piant Statistician, Carbide and Carbon Chemi- 
cals Corp., Oak Ridge, Tenn. 

Isaacson, Stanley L., M.A. (Johns Hopkins Univ.) Graduate student at Columbia 
Univ., 2523 Loyola Southway, Baltimore, Maryland. 

May, Kenneth, Ph.D., (Univ. of Calif.) Assistant Professor of Mathematics, Carleton 
College, Northfield, Minnesota. 

Mirsky, Robert, A.M. (Johns Hopkins Univ.) Graduate student at Columbia Univ., 
7 West 705th Street, Shanks Village, Orangeburg, New York. 

Mulhall, Harold, B.Sc. (Sydney) Lecturer in Mathematics, Department of Mathemat- 
ics, University of Sydney, Australia. 

Palm, Conny, Ph.D. (Stockholm) Docent, Ynglingar 11, Djursholm, Sweden. 

Pease, Katharine, A.M. (Smith College) Instructor in Psychology, Barnard College, 
Columbia University, New York 27, New York. 

Peckham, Cyril G., M.S. (Univ. of Ill.) Assistant Professor of Mathematics, University 
of Dayton, Dayton 9, Ohio. 

Peterson, Raymond P., Jr., B.A. (Univ. of Calif., Los Angeles) Assistant in Mathemat- 
ics, University of California, Los Angeles, Calif., 10729 Ashton Ave., Los Angeles 
24, California. 

Pike, Eugene W., Ph.D., (Princeton) Member McFarlan, Groth & Pike, 510 Audubon 
Ave., New York 33, New York. 

Pitman, Edwin J. G., M.A. (Univ. of Melbourne) Professor of Mathematics, Univ. of 
Tasmania, Hobart, Tasmania. 

Rigby, Fred D., Ph.D., (Univ. of Iowa) Mathematician, Office of Naval Research, P.O. 
Box 234, Falls Church, Virginia. 

Smith, Clarence DeWitt, Ph.D. (Univ. of Iowa) Associate Professor of Statistics, Box 
2686, University, Alabama. 

Srinivasan, T. K., M.A. (Madras) Assistant Lecturer, Mathematics Department, 
Raja’s College, Pudukkottah, S-I-R, South India. 

Straubel, Morgan P., Quality Control Analyst, 4124 Ivanrest Road, Grandville, Michigan. 

Taylor, William F., A.B. (Univ. of Calif., Berkeley) Associate, School of Public Health, 
3042 Wheeler St., Berkeley, California. 





fall, 
ich) 
tish 
tric 
aval 


y of 
rey, 


olm, 
aple 
emi- 
nbia 
eton 
niv., 


mat- 


lege, 
rsity 


mat- 
geles 


lubon 
v. of 
P.O. 
- Box 
nent, 


an. 
>alth, 


NEWS AND NOTICES 131 


Trindade, Mario, Chief of the Statistical Division of the Instituto de Resseguros do Brazil, 
Rua Senador Soares 33, ap. 201, Rio de Janeiro, Brazil. 

Von Schelling, Hermann, Ph.D. (Univ. of Berlin) Naval Medical Research Laborato- 
ry, U.S. Submarine Base, New London, Conn. 

Whidden, Phillips, A.B. (Harvard) Part-time Instructor in Mathematics, Carnegie Insti- 
tute of Technology, Pittsburgh 13, Pa. 

Wolman, William, B.B.A. (College of City of New York) Statistician, New York State 
Division of Housing, 295 Parkside Avenue, Brooklyn 26, New York. 

Woodbury, Lowell A., Ph.D. (Univ. of Michigan) Assistant Professor of Physiology, 
Dept. of Physiology, University of Utah Medical School, Salt Lake City 1, Utah. 

Yusuf, Mohammad, M.A. (Aligarh Muslim Univ., India) Graduate student at Columbia 
University, 208, Furnald Hall, Columbia University, New York 27, New York. 





REPORT ON THE BERKELEY MEETING OF THE INSTITUTE 


The thirtieth meeting of the Institute of Mathematical Statistics was held in 
Berkeley, California on Monday and Tuesday, December 22 and 23, 1947. The 
meeting was attended by approximately 70 persons, including the following 31 
members of the Institute: 


G. A. Baker, G. G. Beckstead, B. M. Bennett, R. U. Bonner, Frances L. Campbell, E. L, 
Crow, Dorothy Cruden, W. J. Dixon, R. Dorfman, G. G. Eldredge, E. A. Fay, Evelyn Fix, 
M. A. Girshick, J. Gurland, T. E.Harris, W. L. Hart, J. L. Hodges, Jr., P. G. Hoel, H. M. 
Hughes, T. A. Jeeves, H. S. Konijn, G. M. Kuznets, E. L. Lehmann, R. B. Leipnik, J. Ney- 
man, Gladys Rappaport, H. Scheffé, T. W. Simpson, C. M. Stein, J. E. Walsh and H 
Working. , 


The Monday morning program, with Professor J. Neyman presiding, consisted 
of the following contributed papers: 


1. The Performance Characteristic of Certain Methods for Obtaining Confidence Intervals. 

Mr. B. M. Bennett, University of California, Berkeley. 

. Some Further Results on the Bernoulli Process. 
Dr. T. E. Harris, Douglas Aircraft Company. 

. Most Powerful Tests of Composite Hypotheses I. Normal Distributions. 
Dr. E. L. Lehmann and Dr. C. M. Stein, University of California, Berkeley. 

. On the Selection of Forecasting Formulas. 
Professor P. G. Hoel, University of California, Los Angeles. 


The Monday afternoon program, with Professor H. Scheffé presiding, also 
consisted of contributed papers as follows: 


1. On the Power Function of the ‘‘Best’’ t-test Solution of the Behrens-Fisher Problem. 
Dr. J. E. Walsh, Douglas Aircraft Company. 
. On Sequences of Experiments. 
Dr. C. M. Stein, University of California, Berkeley. 

. The Effect of Selection above Definite Lower Limits of Linear Functions of Normally 
Distributed Correlated Variables on the Means and Variances of Other Linear Functions. 
Professor G. A. Baker, University of California, Davis. 

. An Inversion Formula for the Distribution of a Ratio of Random Variables. 

Dr. J. Gurland, University of California, Berkeley. 

. Independence of Parameters and Sufficient Statistics. 

Dr. E. W. Barankin, University of California, Berkeley. 


The Tuesday morning session, with Professor R. A. Gordon presiding, was 
devoted to the following invited and contributed papers on econometrics: 


1. Remarks on the Theory of Indices. 
Professor G. C. Evans, University of California, Berkeley. 
2. Interrelations of Theory and Statistical Research in Economics. 
Professor H. Working, Stanford University. 
. Statistical and Case Methods in a Study of Labor Mobility. 
Professor D. McEntire, University of California, Berkeley. 
Discussion: Dr. M. Lipton, University of California, Berkeley. 


132 





REPORT ON NEW YORK MEETING 


4. Distributions Associated with Continuous Stochastic Processes. 
Dr. R. B. Leipnik, University of California, Berkeley. 

5. On Some Methods of Evaluating Railway Costs. (By title) 
Miss Evelyn Fix, University of California, Berkeley. 


There was a dinner on Monday evening for members and guests at the Hotel 
Claremont and an informal discussion and coffee on Tuesday afternoon. 


(RR ee 


REPORT ON THE NEW YORK MEETING OF THE INSTITUTE 


The Tenth Annual Meeting of the Institute of Mathematical Statistics was 
held at the Commodore Hotel, New York City, on December 28-30, 1947. The 
meeting was held in conjunction with the American Statistical Association. 
The following 173 members of the Institute were in attendance: 


F.S. Acton, R. L. Anderson, H. E. Arnold, L. A. Aroian, M. Astrachan, H. M. Baldwin, 
W. D. Baten, R. E. Bechhofer, G. W. Beebe, M. H. Belz, A. A. Bennett, A. J. Berman, A. 
Blake, C. I. Bliss, P. Boschan, A. H. Bowker, A. E. Brandt, T. H. Brown, M. A. Brumbaugh, 
M. C. Bruyere, P. T. Bruyere, T. A. Budne, R. W. Burgess, R. 8S. Burington, B. H. Camp, 
G. C. Campbell, P. G. Carlson, Jr., U. Chand, H. Chernoff, Kai-Lai Chung, P. C. Clifford, 
W.G. Cochran, D. D. Cody, J. Cornfield, G. M. Cox, J. H. Curtiss, J. F. Daly, G. B. Dant- 
zig, D. G. Deihl, H. F. Dorn, A. J. Dunean, C. W. Dunnett, D. Durand, J. Dutka, P. 8. 
Dwyer, G. L. Edgett, C. Eisenhart, B. Epstein, M. W. Eudey, W. D. Evans, Will Feller, 
C. D. Ferris, C. B. Fine, M. M. Flood, L. R. Frankel, J. E. Freund, B. Friedman, Hilda 
Geiringer, M. A. Geisler, H. H. Germond, M. A. Girshick, Abraham Golub, C.H. Graves, 
S. W. Greenhouse, J. A. Greenwood, T. N. E. Greville, J. I. Griffin, E. T. Gumbel, M. 
Gurney, K. W. Halbert, Max Halperin, M. H. Hansen, T. E. Harris, B. Harshbarger, 
Alex Hart, P. M. Hauser, J. D. Heide, L. H. Herbach, M. W. Hirsch, Harold Hotelling, H. 
M. Humes, C. C. Hurd, S. Jablon, C. M. Jaeger, A. S. Kaitz, Leo Katz, T. L. Kelley, L. S. 
Kellogg, L. F. Knudsen, A. K. Kury, Jack Laderman, M. LeLeika, Joseph Lev, Howard 
Levene, J. E. Lieberman, Julius Lieblem, 8. B. Littauer, Eugene Lukacs, Geo. A. Lundberg, 
J. C. MePherson, Benjamin Malzberg, Sophie Marcuse, E. S. Marks, H. C. Mathisen, J. 
W. Mauchly, A. L. Mayerson, Margaret Merrell, E. B. Mode, E. C. Molina, M. E. Moore, 
D. J. Morrow, J. E. Morton, Jack Moshman, Hugo Muench, D. N. Nanda, M. G. Natrella, 
Doris Newman, G. E. Nicholson, Jr., Harold Nisselson, Nilan Norris, H. W. Norton, P.S. 
Olmstead, A. L. O’Toole, A. E. Paull, C. N. Payne, Katherine Pease, M. P. Peisakoff, E. 
W. Pike, O. A. Pope, G. B. Price, L. J. Reed, J.S. Rhodes, 8. F. Robinson, A. C. Rosander, 
Ernest Rubin, P. J. Rulon, ose Sachs, Frank Saidel, Arthur Sard, M. M. Sandomire, F. 
E. Satterthwaite, E. D. Schell, Bernice Scherl, O. N. Serbein, R. G. Seth, Harry Shulman, 
Rosedith Sitgreaves, C. DeW. Smith, G. W. Snedecor, Herbert Solomon, D. E. South, 
Arthur Stein, G. T. Steinberg, Joseph Steinberg, A. I. Sternhell, S. A. Stouffer, J. V. Stur- 
tevant, B. R. Suydam, W. R. Thompson, Gerhard Tintner, J. W. Tukey, D. F. Votaw, Jr., 
A. J. Wadman, H. M. Walker, Dzung-shu Wei, Sidney Weiner, Samuel Weiss, Sophie R. 
Wilkey, R. I. Wilkinson, S. S. Wilks, C. P. Winsor, Jacob Wolfowitz, W. J. Youden. 


The first session, a joint session with the American Statistical Society, was 
held on the morning of December 28 and was devoted to the topic The Teaching 
of Statistics. Professor W. G. Cochran of North Carolina State College presided. 
A paper entitled Three Recent Reports Dealing with the Teaching of Statistics, 














134 REPORT ON NEW YORK MEETING 


the Training of Statisticians and the Crisis in Statistical Personnel was presented 
by Dr. James D. Paris of the Metropolitan Life Insurance Company. Many 
members participated in the general discussion which followed. 

The second session on The Teaching of Statistics also with the American Sta- 
tistical Association, was held at 1:15 P.M. Professor Francis G. Cornell of the 
University of Illinois was chairman. The main paper of the session was the 
paper by Professor George W. Snedecor of Iowa State College entitled Syllabus 
for a Proposed Course in Basic Statistics. This was followed by prepared dis- 
cussion by: professors Elmer B. Mode, Boston University; Helen M. Walker, 
Teachers College, Columbia University; Samuel A. Stouffer, Harvard Uni- 
versity; and Albert E. Waugh, Department of Economics, University of Con- 
necticut. Many members participated in the general discussion. At the 
conclusion of this session, a film on Modern Quality Control was shown by Mr. 
Simon Collier of the Johns Manville Company. 

Two Monday sessions, also held jointly with the American Statistical As- 
sociation, and with the cooperation of the Operations Evaluation Group of the 
Navy and the Operations Analysis of the Air Force, were devoted to Operations 
Research. Professor Edward L. Bowles of Massachusetts Institute of Tech- 
nology presided at the Morning session. The following papers: 


1. Operations Research in the Department of the Navy. 
Dr. J. Steinhardt, Director, Operations Evaluation Group. 
2. Operations Research in the Department of the Air Forces. 
Dr. Leroy A. Brothers, Chief, Operations Analysis. 


were followed by discussion by Dr. Arthur A. Brown, Operations Evaluation 
Group, Dr. Thomas I. Edwards, Operations Analysis, Professor G. Baley Price, 
The University of Kansas and Wartime Operations Analyst and Dr. W. J. 
Youden, Douglas Aircraft Company and Wartime Operations Analyst. 

Dr. Merrill M. Flood, Assistant Deputy Director of Research and Develop- 
ment, General Staff, U. S. Army, presided at the afternoon session. The fol- 
lowing papers were presented: 


1. Operations Analysis in the Southwest Pacific Air War. 
Dr. Roger I. Wilkinson, Bell Telephone Laboratories and Wartime Operations Ana- 
lyst. 

2. Operations Analysis of Air-Sea Rescue. 
Dr. E. 8. Lamar, Operations Evaluation Group. 

3. Factorial Chi-Square in Test Shooting. 
Dr. A. E. Brandt, Technical Director, Naval Ordnance Laboratory and Wartime 
Operations Analyst. 

4. Mathematical Techniques of Program Planning. 
Dr. George Dantzig, Consultant to the Air Comptroller, Headquarters, USAF. 


A session on the Application of the Theory of Extreme Values was held jointly 
with the American Statistical Association on Tuesday, December 30. Professor 
Jacob Wolfowitz of Columbia University presided at the session. The following 
papers were presented: 





ny 


ta- 
the 
the 
us 
lis- 


ni- 
on- 
the 
Mr. 


As- 
the 
ons 
ch- 


‘jon 
ice, 


lop- 
fol- 


\na- 


ne 


ntly 
ssor 
ying 


REPORT ON NEW YORK MEETING 135 


1. Introduction: The Mathematical Theory of Extreme Values. 
Professor Richard Von Mises, Harvard University. 
2. Applications to the Prediction of Flood Flows. 
Professor Emil Gumbel, Brooklyn College. 
3. Applications to Meteorology. 
Dr. Horace Norton, Weather Bureau, Washington, D. C. 
4. Applications to Fracture Problems. 
Dr. Benjamin Epstein, Coal Research Laboratory, Carnegie Institute of Technology. 


The session concluded with discussion by Miss Marion Sandomire, Navy Depart- 
ment, Bureau of Ships and Dr. Bradford Kimball, Port Washington, New York. 

A session on Statistical Techniques in Life Insurance was held jointly with 
the American Statistical Association at 1:15 P.M., December 30. Mr. Robert 
J. Myers, Actuarial Consultant, Social Security Administration, was chairman 
of the meeting. The following papers were presented: 


1. Problems with Sampling Procedures for Reserve Valuations. 
Mr. George C. Campbell, Supervisor, Actuarial Division, Metropolitan Life Insurance 
Company. 
2. Sampling Errors in Life Insurance Mortality and Other Statistics. 
Mr. Donald Cody, Assistant Actuary, Equitable Life Assurance Society. 
3. Recent Developments in Graduation and Interpolation. 
Dr. T. N. E. Greville, National Office of Vital Statistics, U.S. Public Health Service. 


A session of contributed papers was held at 3:30 P.M. on December 30. Dr. 
T. N. E. Greville of the National Office of Vital Statistics presided. The fol- 
lowing papers were presented: 


1. Distribution of the Circular Serial Correlation Coefficient for Residuals from a Fitted 
Fourier Series. (Preliminary Report.) 
Professor R. L. Anderson, North Carolina State College and Professor T. W. Ander- 
son, Jr., Columbia University. 

2. Some New Methods for Distributions of Quadratic Forms. 
Professor Harold Hotelling, Institute of Statistics, University of North Carolina. 

3. Frequency Functions Defined by the Pearson Difference Equation. 
Professor Leo Katz, Michigan State College, East Lansing. 

4. Distribution of the Sum of Roots of a Determinantal Equation Under a Certain Condition. 
Mr. D. N. Nanda, Institute of Statistics, University of North Carolina. 

5. Applications of Carnap’s Probability Theory to Statistical Inference. 
Professor Gerhard Tintner, Department of Economics, Iowa State College. 

6. Circular Probable Error of an Elliptical Gaussian Distribution. 
Dr. H. H. Germond, S. W. Marshall & Co., Washington, D.C. 


The annual business meeting of the Institute was held at 4:30 P.M., December 
29, 1947 in the ball room of the Commodore Hotel. There were reports by the 
President, Secretary-Treasurer, Mr. Morris Hansen, Chairman of the Com- 
mittee on Planning and Development, and Dr. John Curtiss, Chairman of the 
Program Committee. Mr. Hansen presented a tentative form of the proposed 
new constitution while Dr. Curtiss discussed program plans. There was some 
discussion on these general questions from the floor. 





136 REPORT ON THE CHICAGO MEETING 


Professor A. Wald was elected President, and Dr. Churchill Eisenhart and 
Professor Henry Scheffé, Vice-Presidents. 
Pau. 8S. Dwyer, 
Secretary. 


ce raenaenents rn nRIR a o 


REPORT ON THE CHICAGO MEETING OF THE INSTITUTE 


The thirty-second meeting of the Institute of Mathematical Statistics was 
held at the Sherman Hotel, Chicago, Monday and Tuesday, December 29-30. 
The meeting was held in conjunction with the one hundred fourteenth meeting 
of the American Association for the Advancement of Science and Co-operating 
Associated Societies. The following twenty-eight members of the Institute 
attended the meeting: 


W. Bartky, D. H. Blackwell, G. M. Brown, I. W. Burr, A. G. Carlton, M. Castellanos, 
C. W. Cotterman, A. T. Craig, J. H. Davidson, R. C. Davis, W. E. Deming, M. Elveback, 
M. L. Garbuny, W. W. Gutzman, T. J. Jaramillo, E. 8. Keeping, T. C. Koopmans, E. L. 
Lahti, M. M. Lavin, K. May, J. A. Pierce, O. Reiersol, H. Rubin, L. J. Savage, J. Silber, 
W. A. Wallis, E. L. Welker and J. W. Wilkins. 


The Monday afternoon session was devoted to contributed papers of Section A, 
AAAS, and of the Institute, and to the Vice-Presidential address of Section A. 
The following papers were presented: 


1. On the Boundary Layer Motion along a Periodically Osciliating Plane in Compressible 

Viscous Fluids. 
Dr. M. Z. Krzywoblocki, University of Illinois. 

. Variations of the Probability of Unfair Election Results. 
Dr. Kenneth May, Carleton College. 

. Normal Equations with Nearly Vanishing Determinants. 
Dr. M. Herzberger and Dr. R. Norris. 

. Composition of Binary Quadratic Forms. 
Professor Gordon Pall, Illinois Institute of Technology. 

. A Proof of the Asymptotic Analogue of the Theorem of Cramér and Rao. 
Dr. Herman Rubin, Institute for Advanced Study. 

. The Solution of Differential Equations in the Presence of Turning Points, Vice-Presi- 
dential address of Section A. 


The Tuesday afternoon session was also a joint session of Section A and the 
Institute, with Dean Walter Bartky of the University of Chicago presiding. 
The following two papers were presented upon invitation of the Institute: 


1. Application of the Radon-Nikodym Theorem to the Theory of Sufficient Statistics. 
Professor P. R. Halmos and Dr. L. J. Savage, University of Chicago. 

2. Unbiased Sequential Estimation. 
ProfessorDavid Blackwell, Howard University. 





essible 


REPORT OF THE PRESIDENT OF THE INSTITUTE FOR 1947 


The healthy growth of the Institute has continued through 1947. The 
membership increased from 900 to 1046. This increase is gratifying as a sign 
that more and more people appreciate the usefulness of basic theory and are 
ready to support research by making our Annals possible. It is is also pleasing 
to note that statistical theory and methodology are reaching new fields and 
that new groups as a whole are becoming conscious of the usefulness of contact 
with mathematical statistics. These developments are reflected in the meetings 
of the Institute. 

Meetings. The Ninth and Tenth Annual meetings (for 1946 and 1947) were 
held in the traditional way in conjunction with the meetings of the American 
Statistical Association (January—Atlantic City and Christmas—New York). 
The Tenth Summer Meeting was held with the American Mathematical Society 
and the Mathematical Association of America (September—Yale). Regional 
meetings were held in California (June—San Diego, December—Berkeley) and 
in Chicago (December), the latter in conjunction with the meetings of the 
American Association for the Advancement of Science (AAAS). Moreover, 
two meetings were organized with specialized programs of interest to groups 
with whom the Institute has not previously had much contact. A meeting 
in April at Columbia University, co-sponsored by the American Mathematical 
Society, was devoted to Stochastic Processes and Random Noise, and another 
meeting held simultaneously at Atlantic City was in conjunction with the meeting 
of the Eastern Psychological Association. It is clear that with such diversified 
meetings the Program Committee could not always act as a unit. J. H. Curtiss 
was its Chairman and J. Neyman and J. W. Tukey arranged some of the pro- 
grams. Other members of the Committee were: C. W. Churchman, T. 
Koopmans, F. C. Mosteller, J. Neyman, H. Scheffé, J. Wolfowitz, and H. 
Working. ; 

At the Tenth Summer Meeting A. Wald delivered the first Henry L. Rietz 
Memorial Lecture. It is desirable to preserve the solemnity of the occasion 
of the Rietz lectures and it was therefore decided that they should not be given 
every year. Accordingly, no Rietz lecturer has been selected for 1948. 

The Institute had no share in the program of the International Statistical 
Congress in Washington. However, Fellows of the Institute were invited to 
that Congress. This Congress and the Princeton Bi-Centennial were beneficial 
by establishing more intimate personal ties with our European colleagues. It is 
widely felt on both sides of the ocean that a closer cooperation, in particular 
with British statisticians, is highly desirable. Various suggestions in that 
direction were informally discussed in Washington and Princeton and M. G. 
Kendall has kindly consented to explore the practical possibilities. It is needless 
to say that the Institute is eager to do everything possible to promote cooperation 
and increase its usefulness also to our British colleagues. 

137 





138 REPORT OF THE PRESIDENT 


Relations with other organizations. It is gratifying to note that the cooperation 
of the Institute with sister societies is growing in intensity. The last two Presi- 
dential reports mentioned plans for a reorganization of the American Statistica] 
Association with a view to more intimate relations among statistical societies. 
The revision of the constitution of the Association is not yet completed. It ap- 
pears now that also the American Mathematical Society feels the need of closer co- 
laboration with all groups interested in applied mathematics. It is too early to 
predict the results of these movements but it is clear that we must devote careful 
thought to our own organization and to our future relations with other groups. 

In 1947 the AAAS organized an Inter-Society Committee for the National 
Science Foundation Legislation. At the first meeting in Washington we were 
represented by J. H. Curtiss and W. A. Shewhart and at the meeting in December 
in Chicago by W. Bartky. In ballots on the two controversial subjects the 
Institute voted against exclusion of social sciences and abstained on the question 
of patent rights. W. Feller represented the Institute on the Policy Committee 
of the American Mathematical Society. Through this Committee the Institute 
went on record as favoring the National Science Foundation Bill. Otherwise 
the discussions of the Policy Committee were mostly connected with the es- 
tablishment of an International Mathematical Union. Cletus O. Oakley rep- 
resented the Institute on the Publicity Committee of the American Mathematical 
Society of which he is chairman. G. W. Snedecor was our representative on the 
AAAS Council, W. Bartky on the National Research Council, F. C. Mosteller 
and 8. S. Wilks on the Joint Committee for the Development of Statistical 
Application in Engineering and Manufacturing. In recent years the common 
interests of the Institute and the actuarial profession have grown in importance 
and it has been suggested that closer cooperation would be beneficial to both 
parts. A new committee has been established to explore these possibilities 
and in particular to arrange a joint meeting during 1948. Members of this 
committee are: G. C. Campbell, T. N. E. Greville, C. Fisher, C. Spoerl, 
Chairman. 

Internal Work. The growth of the Institute has rendered parts of the Con- 
stitution obsolete and a revision seems indicated. In particular, it appears that 
the present system of elections is no longer satisfactory. The Institute is deeply 
indebted to its Committee on Planning and Development which has devoted 
much thought and consideration not only to a revision of the Constitution but 
also to the future development of the Institute as a whole. The membership 
had occasion to discuss the preliminary plans at two business meetings. M. H. 
Hansen acted as Chairman of the Committee; other members were: J. H. Curtiss, 
W. G. Cochran, J. Neyman, H. W. Norton, F. F. Stephan, J. W. Tukey, W. A. 
Wallis. 

A sharp increase in printing costs has, unfortunately, necessitated an increase 
in membership dues. However, the membership should rest assured that the 
financial position of the Institute is intrinsically sound. The cash prospects 
for 1948 are not rosy, but this is due principally to the necessity of reprinting 





REPORT OF THE PRESIDENT 139 


pack-numbers of the Annals which in itself is a sign of health and promise of 
stability. At present the Institute has a considerable reserve in back numbers 
and this reserve is rapidly being transformed into cash. We are also exploring 
the possibilities of new revenue and have started a campaign to get advertise- 
ments for the Annals. A possible campaign for institutional members is held 
in abeyance pending a clarification of our formal relations with sister societies. 
In order to make the Annals available in European countries with monetary 
exchange restrictions, the dues and subscriptions have been increased only for 
the Western Hemisphere. The investments of the Institute have been super- 
vised by the Finance Committee consisting of C. F. Roos, L. A. Knowler, F. F. 
Stephan, and Paul S. Dwyer, Chairman. 

Last year’s Committee on Teaching completed its work and submitted a 
detailed report which will be of great value. It will be published in the Annals 
of Mathematical Statistics. The Committee has been dissolved with special 
thanks of the Board of Directors for their successful work. H. Hotelling was 
chairman and its members were Walter Bartky, W. Edwards Deming, Milton 
Friedman, and Paul Hoel. The Committee on Tabulation under the chairman- 
ship of C. Eisenhart and consisting of Paul S. Dwyer, H. Goldstine, A. Lowan, 
H. W. Norton, and G. R. Stibitz has outlined the work for the coming years 
which promises to be of great interest. 

The Membership Committee consisted of C. C. Craig, P. G. Hoel, and J. H. 
Curtiss as Chairman. On its recommendations the following members were 
elected Fellows: T. W. Anderson, David Blackwell, Frederick Mosteller, Gerhard 
Tintner, Charles P. Winsor, Alexander Aitken, George Darmois, Ragnar Frisch, 
Robert C. Geary, and John Wishart. The Nominating Committee consisted of 
Meyer A. Girshick, Paul G. Hoel, Horace W. Norton, Frederick Mosteller, 
and George W. Snedecor, Chairman. A. Wald was nominated for President, 
and as an innovation four nominations for Vice-presidents were made: C. 
Eisenhart, A. M. Mood, Henry Scheffé, F. F. Stephan. 

The Annals of Mathematical Statistics are covered by a special report of the 
Editor. However, it is appropriate to say that the Institute takes pride in the 
development of the Annals. While members see only its spectacular success, 
they should bear in mind that this is mostly due to the work of one man, S. S. 
Wilks. In view of the great variety of interests of our membership and the 
many desirable directions in which the Annals could develop, it is clear that 
the work of the Editor can not always be pleasing and naturally often means a 
nervous burden. I feel sure that I speak for all our members in expressing the 
Institute’s sincere thanks to S. S. Wilks not only for his work but also for his 
wisdom in striking a sensible balance between many wishes and possibilities 
and leading the Annals so successfully in a direction satisfactory to all of us. 

In thanking all other members who have contributed to the work of the Insti- 
tute, it is hard to find appropriate words to express appreciation for the un- 
selfish efforts and devotion of our Secretary-Treasurer. Few members will 
realize how much of Dwyer’s time and thoughts are spent for the Institute 





140 REPORT OF THE PRESIDENT 


and how much the smooth running of the affairs of the Institute is due to his 
hard work. 

Finally, it is a pleasant duty to express our thanks and appreciation to Prince- 
ton University and to the University of Michigan. These Institutions have 
generously provided office space and other help which has greatly facilitated 
our work and saved us expenses. 


WILL FELLER, 
President, 1947. 
December 31, 1947. 





REPORT OF THE SECRETARY-TREASURER OF THE 
INSTITUTE FOR 1947 


At the beginning of 1947 the Institute had 900 members and during 1947, 
210 new members (10 of which begin their membership with 1948) joined the 
Institute. During 1947 the Institute lost 73 members, 43 by resignation, 25 
by suspension for non-payment of dues, and 5 by death. The Institute has 
1,037 members as it starts 1948. 

The following members died during the year: 


Margaret J. Dix 
Professor Irving Fisher 
Albert M. Freeman 
Professor Henry A. Ruger 
Professor James G. Smith 


A summary of the financial transactions of the Institute is given in the F7- 
nancial Statement for 1947 which follows: 


FINANCIAL STATEMENT 
December 31, 1946 to December 31, 1947 
A. RECEIPTS 


BALANCE ON HANp,* DECEMBER 31, 1946 $7 , 241.55 
rr Serer ye tis eal il pee ecclissi pai eh ROG vn ia a waren tenes Red ba 5,054.43 
es ee eer a ee 287 .50 
SUBSCRIPTIONS 2,892.93 
er I RII oo cas. d Sacha se SO ee ha UR we een each Soa ieansewek 3,969.95 
Nase INCOME PHOM TVEGTMENGS. ... o.oo c 5 occ cic cccccws cede cubis Soeedeeeesc 63.00 
MIscELLANEOUS 76.56 


$19 , 585.92 


B. EXPENDITURES 


ANNALS—CURRENT 
UN ha i i he oi a 
SS ic cintv id baeca: Ki cut nes miemaande marie Rak ween 7,145.79 $7,306.19 


ANNALS—Back NUMBERS 
Reprinted 500 copies each: Vol. III #1 & 2; IV #2; V #2; VII 
Res At Ole 4: AGE Ol; APY SE, SSS. 25. on sc twsiccss 3,039.00 
MPU Ce esac k 5 cya eee OA ae aoa i kata SRS 143.75 3,182.75 


MATHEMATICAL REVIEWS AND INTER-SocIETY FOR NATIONAL ScI- 
I Sc hen A aneminn ds eee bie ee eacu ree eas ease we was 135.00 


* In bank deposits and government bonds. 
141 





142 REPORT OF SECRETARY-TREASURER 


OFFICE OF THE SECRETARY-TREASURER 


Printing, memoranda, etc. (including some stamped envelopes).. 1,100.49 
Postage, supplies, express, telephone calls and cables 


Clerical help 3,002.80 


100.81 
5, 858. 37 


— 


$19, 585.92 


C. SumMARY OF RECEIPTS AND EXPENDITURES 


BALANCE ON HaNnp,* DECEMBER 31, 1946 $7,241.55 
RECEIPTS DURING 1947 12,344.37 


13,727.55 
5,858.37 


D. Comparison or AssETs ON DECEMBER 31, 1946 AND DecEMBER 31, 1947 
‘ 47 
U. S. Government G Bonds ,000. $3,000.00 
Life Membership Funds , 888. 1,888.00—Bonds 
427.00—Bank Dep. 
Additional Bank Deposits 214. 543.37 
Current Accounts Receivable 4 423.55 
Estimated Value (Cost) of back issues of Annals**... 7,234.58 10,866.73 


$14,928.75 $17,148.65 


E. LiaBinities oF INSTITUTE OF MATHEMATICAL STATISTICS AS OF DECEMBER 31, 1947 


All bills which have been presented have been paid. The Life Membership Fund now 
contains $2,315.00 which covers 30 members. Also $3,348.11 has been paid in for 1948 
(and later) dues and subscriptions. 


The increase in the size of the Annals from 500 to 600 pages and the phe- 
nomenal activity in the sales of back numbers are the two most important factors 
to be considered in comparing the 1947 statement with those of previous years. 
The Waverly Press bills for 1946 totalled $4,566.27 while the corresponding 
amount for 1947 was $7,145.79 an increase of 56%. The increase is attributable 
not only to the increased size of the Annals but also to the fact that printing 
costs are rising rapidly and, to a less extent, to the fact that we are printing a 
larger number of copies. It is to be noted that the cost of the Annals alone in 
1947 was over $2,000 more than the amount received from dues. As a result 
of the increase in dues, the 1948 report should be more satisfactory in this respect. 

The phenomenal sales in back issues, noted in the report for 1946, were ac- 
celerated in 1947. We sold nearly $4,000 of back issues. These extensive 
sales were embarrassing to our cash position since they exhausted many cf our 
issues and the continued reprinting forced us to place a considerable portion of 


** Cost of Annals calculated at 67 cents per copy. 





REPORT OF SECRETARY-TREASURER 143 


our reserves in inventory (some of which probably will not be returned to cash 
within decades). Eleven issues were reprinted during the first six months of 
1947. The resulting low cash position forced a temporary change in the policy 
of reprinting issues as they became exhausted. 

It was necessary to cash two $1000 interest bearing G bonds to meet the 
Waverly and reprinting bills as they came due. These brought $1938.00 rather 
than $2000 as they have been valued in previous reports. As the income from 
bonds during the year was $125, I have entered the net income from investments 
as $63.00. 

An attempt has been made to keep down the costs of the office of the Secretary- 
Treasurer. The expense for 1947 was about $100 more than the expense for 
1946 and seems very satisfactory in view of the larger membership and greatly 
increased costs of all materials and services. 

For the reasons indicated above, the cash position (including bonds and Life 
Membership payments) was lowered during the year by $1,383.18. This is 
compensated for by an increase in the value of the stock of back issues (valued 
at cost) of $3,632.15. Some members of the finance committee feel that it is 
improper to list all of this stock as assets since we can probably sell only a portion 
of it in the next five or ten years. However, we did sell nearly $4,000 of Annals 
in 1947 and it is indicated (at the new prices) that the sales of issues we have 
now on hand will yield us $11,000 in the next five to ten years. 

Many of the issues which were stored in Iowa City have been sold and Pro- 
fessor Knowler has sent the remaining issues to Ann Arbor. I wish to acknowl- 
edge the work of Professor Knowler in caring for these issues and to express the 
appreciation of the Institute for his efforts over a period of years. I also wish 
to express my appreciation to Mr. Carl Bennett who contributed much time 
and energy in looking after the back issues at Ann Arbor. 

This report does not cover the amount of $390.20 which is held temporarily 
by the Institute for the fund for Annals for Countries Devastated by War. 
Arrangements are being made to purchase Annals for certain institutions which 
the Committee is recommending. 

Pau. 8S. Dwyer, 
Secretary-Treasurer. 
December 31, 1947. 





REPORT OF THE EDITOR FOR 1947 


During the past year the increase in the number of manuscripts submitted 
to the Annals has continued. More manuscripts have been received from 
foreign countries than in any preceding year. During 1947 papers were pub- 
lished by authors in Argentina, Australia, Canada, England, France and Sweden. 
If manuscripts continue to be received at the present rate it will not be possible 
to publish them in the Annals without further expansion. The gap between 
receipts of manuscripts and publication is likely to become serious by the end 
of 1948. The 1947 volume of the Annals contained 56 papers of which 25 were 
short notes. The total number of pages printed was 618, representing an 
increase of approximately 11% over the size of the 1946 volume. It now appears 
that increased printing costs will prevent a further increase in the size of the 
Annals for 1948. It is therefore extremely important that authors submitting 
papers to the Annals make every effort to keep their papers as brief as possible. 

Contributions to probability and statistical theory are continuing to come 
in from a wide variety of fields. They were written by biologists, chemists, 
economists, mathematical statisticians, mathematicians and physicists, rep- 
resenting universities, government agencies and laboratories, business and 
industrial organizations. Some of these contributions are rather heterogeneous 
in quality of results and presentation. However, patient attempts are being 
made to have all papers with novel and interesting results suitably revised and 
published. Attempts to have expository papers prepared are being continued. 

The Editor wishes to take this opportunity to acknowledge, on behalf of the 
Editorial Committee, the generous refereeing assistance which has been given 
by the following persons: L. A. Aroian, Z. W. Birnbaum, David Blackwell, 
A. H. Bowker, I. W. Burr, G. W. Brown, K. L. Chung, W. J. Dixon, T. N. E. 
Greville, F. E. Grubbs, J. B. S. Haldane, T. E. Harris, C. Hastings, L. Henkin, 
G. A. Hunt, B. F. Kimball, T. Koopmans, 8. Kullback, E. L. Lehmann, H. 
Levene, H. B. Mann, P. J. McCarthy, W. E. Milne, R. Otter, M. P. Peisakoff, 
H. E. Robbins, L. J. Savage, F. F. Stephan, D. F. Votaw, and J. E. Walsh. 

The Editor is also indebted to the following persons at Princeton University 
for preparation of manuscripts for the printer, and other editorial and office 
assistance: Miss Jacqueline G. Foster, M. F. Freeman and J. E. Walsh. 

S. S. WILKs, 
Editor. 
December 31, 1947. 





CONSTITUTION AND BY-LAWS 
OF THE 
INSTITUTE OF MATHEMATICAL STATISTICS 


(a a ee IN ae 


Constitution 
ARTICLE I 
NAME AND PURPOSE 


1. This organization shall be known as the Institute of Mathematical Statistics. 
2. Its object shall be to promote the interests of mathematical statistics. 


ARTICLE II 
MEMBERSHIP 


1. The membership of the Institute shall consist of Members, Fellows, Honorary 
Members, and Sustaining Members. 

2. Voting members of the Institute shall be (a) the Fellows, and (b) all others, Junior 
members excepted, who have been members for twenty-three months prior to the date 
of voting. 

3. No person shall be a Junior Member of the Institute for more than a limited term 
as determined by the Committee on Membership and approved by the Board of Directors. 


ARTICLE III 
Orricers, Boarp oF DirEcTORS, AND COMMITTEE ON MEMBERSHIP 


1. The Officers of the Institute shall be a President, two Vice-Presidents, and a Secre- 
tary-Treasurer. The terms of office of the President and Vice-Presidents shall be one 
year and that of the Secretary-Treasurer three years. Elections shall be by majority 
ballots at Annual Meetings of the Institute. Voting may be in person or by mail. 

(a) Exception. The first group of Officers shall be elected by a majority vote of the 
individuals present at the organization meeting, and shall serve until December 31, 1936. 

2. The Board of Directors of the Institute shall consist of the Officers, the two previous 
Presidents, and the Editor of the Official Journal of the Institute. 

3. The Institute shall have a Committee on Membership composed of a Chairman and 
three Fellows. At their first meeting subsequent to the adoption of this Constitution, the 
Board of Directors shall elect three members as Fellows to serve as the Committee on 
Membership; One member of the Committee for a term of one year, another for a term 
of two years, and another for a term of three years. Thereafter the Board of Directors 
shall elect from among the Fellows one member annually at their first meeting after their 
election for a term of three years. The president shall designate one of the Vice-Presi- 
dents as Chairman of this Committee. 


ARTICLE IV 
MEETINGS 


1. A meeting for the presentation and discussion of papers, for the election of Officers, 
and for the transaction of other business of the Institute shall be held annually at such 


145 











146 INSTITUTE OF MATHEMATICAL STATISTICS 


time as the Board of Directors may designate. Additional meetings may be called from 
time to time by the Board of Directors and shall be called at any time by the President 
upon written request from ten Fellows. Notice of the time and place of meeting shall 
be given to the membership by the Secretary-Treasurer at least thirty days prior to the 
date set for the meeting. All meetings except executive sessions shall be open to the 
public. Only papers accepted by a Program Committee appointed by the President 
may be presented to the Institute. 

2. The Board of Directors shall hold a meeting immediately after their election and 
again immediately before the expiration of their term. Other meetings of the Board may 
be held from time to time at the call of the President or any two members of the Board. 
Notice of each meeting of the Board, other than the two regular meetings, together with 
a statement of the business to be brought before the meeting, must be given to the mem- 
bers of the board by the Secretary-Treasurer at least five days prior to the date set there- 
for. Should other business be passed upon, any member of the Board shall have the 
right to reopen the question at the next meeting. 

3. Meetings of the Committee on Membership may be held from time to time at the 
call of the Chairman or any member of the Committee provided notice of such call and 
the purpose of the meeting is given to the members of the Committee by the Secretary- 
Treasurer at least five days before the date set therefor. Should other business be passed 
upon, any member of the Committee shall have the right to reopen the question at the 
next meeting. Committee business may also be transacted by correspondence if that 
seems preferable. 

4. At a regularly convened meeting of the Board of Directors, four members shall 
constitute a quorum. At a regularly convened meeting of the Committee on Member- 
ship, two members shall constitute a quorum. 


ARTICLE V 


PUBLICATIONS 


1. The Annals of Mathematical Statistics shall be the Official Journal for the Institute. 
The Editor of the Annals of Mathematical Statistics shall be a Fellow appointed by the 
Board of Directors of the Institute. The term of office of the Editor may be terminated 
at the discretion of the Board of Directors. 

2. Other publications may be originated by the Board of Directors as occasion arises. 


ARTICLE VI 


EXPULSION OR SUSPENSION 


1. Except for non-payment of dues, no one shall be expelled or suspended except by 
action of the Board of Directors with not more than one negative vote. 


ARTICLE VII 
AMENDMENTS 


1. This constitution may be amended by an affirmative two-thirds vote at any regu- 
larly convened meeting of the Institute provided notice of such proposed amendment 
shall have been sent to each voting member by the Secretary-Treasurer at least thirty 
days before the date of the meeting at which the proposal is to be acted upon. Voting 
may be in person or by mail. 





nt 


ith 
m- 


re- 
the 
the 


nd 
ry- 


the 
hat 


all 
er- 


ite. 
the 
ted 


ent 
rty 


BY-LAWS 147 


By-laws 
ARTICLE I 


DvuTIES OF THE OFFICERS, THE EpiTtor, BoaRD OF DIRECTORS, AND 
CoMMITTEE ON MEMBERSHIP 


1. The President, or in his absence, one of the Vice-Presidents, or in the absence of 
the President and both Vice-Presidents, a Fellow selected by vote of the Fellows present 
shall preside at the meetings of the Institute and of the Board of Directors. At meetings 
of the Institute, the presiding officer shall vote only in the case of a tie, but at meetings 
of the Board of Directors he may vote in all cases. At least three months before the date 
of the annual meeting, the President shall appoint a Nominating Committee of three 
members. It shall be the duty of the Nominating Committee to make nominations for 
Officers to be elected at the annual meeting and the Secretary-Treasurer shall notify all 
voting members at least thirty days before the annual meeting. Additional nominations: 
may be submitted in writing, if signed by at least ten Fellows of the Institute, up to the 
time of the meeting. 

9. The Secretary-Treasurer shall keep a full and accurate record of the proceedings at 
the meetings of the Institute and of the Board of Directors, send out calls for said meet- 
ings and, with the approval of the President and the Board, carry on the correspondence 
of the Institute. Subject to the direction of the Board, he shall have charge of the ar- 
chives and other tangible and intangible property of the Institute and upon the direction 
of the Board he shall publish in the Annals of Mathematical Statistics a classified list of all 
Members and Fellows of the Institute. He shall send out calls for annual dues and ac- 
knowledge receipt of same; pay all bills approved by the President for expenditures 
authorized by the Board or the Institute; keep a detailed account of all receipts and ex- 
penditures, prepare a financial statement at the end of each year and present an abstract 
of the same at the annual meeting of the Institute after it has been audited by a Member 
or Fellow of the Institute appointed by the President as Auditor. The Auditors shall 
report to the President. 

3. Subject to the direction of the Board, the Editor shall be charged with the responsi- 
bility for all editorial matters concerning the editing of the Annals of Mathematical Sta- 
tistics. He shall, with the advice and consent of the Board, appoint an Editorial Com- 
mittee of not less than twelve members to co-operate with him; four for a period of five 
years, four for a period of three years, and the remaining members for a period of two 
years, appointments to be made annually as needed. All appointments to the Editorial 
Committee shall terminate with the appointment of a new Editor. The Editor shall 
serve as editorial adviser in the publication of all scientific monographs and pamphlets 
authorized by the Board. 

4. The Board of Directors shall have charge of the funds and of the affairs of the In- 
stitute, with the exception of those affairs specifically assigned to the President or to the 
Committee on Membership. The Board shall have authority to fill all vacancies ad 
interim, occurring among the Officers, Board of Directors, or in any of the Committees. 
The Board may appoint such other committees as may be required from time to time to 
carry on the affairs of the Institute. The power of election to the different grades of 
Membership, except the grades of Member and Junior Member, shall reside in the Board. 

5. The Committee on Membership shall prepare and make available through the 
Secretary-Treasurer an announcement indicating the qualifications requisite for the 








148 INSTITUTE OF MATHEMATICAL STATISTICS 






different grades of membership. The Committee shall review these qualifications periodi- 
cally and shall make such changes in these qualifications and make such recommendations 
with reference to the number of grades of membership as it deems advisable. The power 
to elect worthy applicants to the grades of Member and Junior Member shall reside in 
the Committee, which may delegate this power to the Secretary-Treasurer, subject to 
such reservations as the Committee considers appropriate. The Committee shall make 
recommendations to the Board of Directors with reference to placing members in other 
grades of membership. The Committee shall give its attention to the question of in- 
creasing the number of applicants for membership and shall advise the Secretary-Trea- 
surer on plans for that purpose. 


ARTICLE II 


DvES 





1. Members shall pay seven dollars at the time of admission to membership and shall 
receive the full current volume of the Official Journal. Thereafter, Members and Fellows 
shall pay seven dollars annual dues. Honorary members shall be exempt from all dues. 

A Sustaining Member shall pay annual dues of a multiple of one hundred dollars. 

An approved nominee of a Sustaining Member shall be a member in good standing 
without payment of dues for each year in which he is nominated provided that in that 
year he has been a member for less than three years. 

(a) Exception. In the case that two Members of the Institute are husband and wife 
and they elect to receive between them only one copy of the Official Journal, their dues 
shall each be reduced by twenty-five per cent. 

(b) Exception. Any Member or Fellow may make a single payment which will be 
accepted by the Institute in place of all succeeding annual dues and which will not other- 
wise alter his status as a Member or Fellow and will be based upon a suitable table and 
rate of interest, to be specified by the Board of Directors. 

(c) Exception. Any Member of Fellow of the Institute serving, except as a commis- 
sioned officer, in the Armed Forces of the United States, or of a friendly power, will, upon 
notification to the Secretary-Treasurer, be excused from the payment of dues until the 
January first following his discharge from service or his commissioning as an officer. He 
shall have all privileges of membership except that he shall not receive the Official Journal. 
However, during the first year of his resumed membership he may elect to receive one 
copy of each volume of the Official Journal published during the period of his service 
membership by paying one-half of the total of dues excused. 

(d) Exception. Anyone who resides outside the Western Hemisphere shall pay five 
dollars annual dues. 

2. Annual dues shall be payable on the first day of January of each year. 

3. Five dollars of the annual dues of each Member and Fellow shall be for a subscrip- 
tion to the Official Journal. Fifteen dollars of the dues of each Sustaining Member shall 
be for two subscriptions to the Official Journal, and the binding of one copy. 

4. For each one hundred dollars of annual dues, a Sustaining Member shall be entitled 
to nominate two persons for membership in the Institute. 

5. It shall be the duty of the Secretary-Treasurer to notify by mail anyone whose dues 
may be six months in arrears, and to accompany such a notice by a copy of this article. 
If such person fail to pay such dues within three months from the date of mailing such 
notice, the Secretary-Treasurer shall report the delinquent to the Board of Directors. 











































BY-LAWS 149 


The Board of Directors may strike the delinquent’s name from the rolls and withdraw 
all privileges of membership, and may reinstate the delinquent upon payment of arrears 
of dues. 


ARTICLE III 
SALARIES 


1. The Institute shall not pay a salary to any Officer, Director, or member of any 
committee. 


ARTICLE IV 


AMENDMENTS 


1. These By-Laws may be amended in the same manner as the Constitution or by a 
majority vote at any regularly convened meeting of the Institute, if the proposed amend- 
ment has been previously approved by the Board of Directors. 





